FreeLing  4.0
Public Member Functions | Private Member Functions | Private Attributes
freeling::lang_ident Class Reference

Class "lang_ident" checks a text against all known languages and sorts the results by probability. More...

#include <lang_ident.h>

List of all members.

Public Member Functions

 lang_ident ()
 Build an empty language identifier.
 lang_ident (const std::wstring &cfgfile)
 Build a language identifier, read options from given file.
 ~lang_ident ()
 destructor
void add_language (const std::wstring &modelfile)
 load given language from given model file, add to existing languages.
void train_language (const std::wstring &textfile, const std::wstring &modelfile, const std::wstring &code, size_t order)
 train a model for a language, store in modelFile, and add it to the known languages list.
std::wstring identify_language (const std::wstring &text, const std::set< std::wstring > &ls=std::set< std::wstring >()) const
 Classify the input text and return the code of the best language among those in given set.
void rank_languages (std::vector< std::pair< double, std::wstring > > &result, const std::wstring &text, const std::set< std::wstring > &ls=std::set< std::wstring >()) const
 Classify the input text and return the code and perplexity for each language in given set.

Private Member Functions

void language_perplexities (std::vector< std::pair< double, std::wstring > > &, const std::wstring &, const std::set< std::wstring > &) const
 fill a vector with unsorted perplexities for each language in given set

Private Attributes

std::map< std::wstring, idioma * > idiomes
 List of known languages and language models.
std::set< std::wstring > all_known_languages

Detailed Description

Class "lang_ident" checks a text against all known languages and sorts the results by probability.

It creates an instance of "idioma" for each known language, and checks input text against all existing instances.


Constructor & Destructor Documentation

Build an empty language identifier.

freeling::lang_ident::lang_ident ( const std::wstring &  cfgfile)

Build a language identifier, read options from given file.

destructor


Member Function Documentation

void freeling::lang_ident::add_language ( const std::wstring &  modelfile)

load given language from given model file, add to existing languages.

std::wstring freeling::lang_ident::identify_language ( const std::wstring &  text,
const std::set< std::wstring > &  ls = std::set< std::wstring >() 
) const

Classify the input text and return the code of the best language among those in given set.

If set is empty all known languages are considered. If no language reaches the threshold, "none" is returned

void freeling::lang_ident::language_perplexities ( std::vector< std::pair< double, std::wstring > > &  ,
const std::wstring &  ,
const std::set< std::wstring > &   
) const [private]

fill a vector with unsorted perplexities for each language in given set

void freeling::lang_ident::rank_languages ( std::vector< std::pair< double, std::wstring > > &  result,
const std::wstring &  text,
const std::set< std::wstring > &  ls = std::set< std::wstring >() 
) const

Classify the input text and return the code and perplexity for each language in given set.

If set is empty, all known languages are considered.

void freeling::lang_ident::train_language ( const std::wstring &  textfile,
const std::wstring &  modelfile,
const std::wstring &  code,
size_t  order 
)

train a model for a language, store in modelFile, and add it to the known languages list.


Member Data Documentation

std::set<std::wstring> freeling::lang_ident::all_known_languages [private]
std::map<std::wstring,idioma*> freeling::lang_ident::idiomes [private]

List of known languages and language models.


The documentation for this class was generated from the following file: