Main Page | Namespace List | Compound List | File List | Namespace Members | Compound Members | File Members

std Namespace Reference


class  UnigramTextClassifier
 A text classifier based on single characters. The basic idea: texts from the same class will tend to have character (byte) frequencies that are similar. In information theoretical terms, texts from the same class should require the same number of bits to encode them in a perfect encoding. We don't actually have to create the encoding, just use the number of bits. The basic methods are learn (read a corpus and count the frequencies), dump (save the frequencies to a stream) and read, read the frequencies from a stream. More...


typedef map< unsigned char,
unsigned long > 

Typedef Documentation

typedef map<unsigned char,unsigned long> std::frequency_map

frequencies_map is a map of unsigned characters to frequencies.

Generated on Fri Aug 8 15:44:40 2003 for UnigramTextClassifier by doxygen 1.3.3