FreeLing
4.0
|
Class "lang_ident" checks a text against all known languages and sorts the results by probability. More...
#include <lang_ident.h>
Public Member Functions | |
lang_ident () | |
Build an empty language identifier. | |
lang_ident (const std::wstring &cfgfile) | |
Build a language identifier, read options from given file. | |
~lang_ident () | |
destructor | |
void | add_language (const std::wstring &modelfile) |
load given language from given model file, add to existing languages. | |
void | train_language (const std::wstring &textfile, const std::wstring &modelfile, const std::wstring &code, size_t order) |
train a model for a language, store in modelFile, and add it to the known languages list. | |
std::wstring | identify_language (const std::wstring &text, const std::set< std::wstring > &ls=std::set< std::wstring >()) const |
Classify the input text and return the code of the best language among those in given set. | |
void | rank_languages (std::vector< std::pair< double, std::wstring > > &result, const std::wstring &text, const std::set< std::wstring > &ls=std::set< std::wstring >()) const |
Classify the input text and return the code and perplexity for each language in given set. | |
Private Member Functions | |
void | language_perplexities (std::vector< std::pair< double, std::wstring > > &, const std::wstring &, const std::set< std::wstring > &) const |
fill a vector with unsorted perplexities for each language in given set | |
Private Attributes | |
std::map< std::wstring, idioma * > | idiomes |
List of known languages and language models. | |
std::set< std::wstring > | all_known_languages |
Class "lang_ident" checks a text against all known languages and sorts the results by probability.
It creates an instance of "idioma" for each known language, and checks input text against all existing instances.
Build an empty language identifier.
freeling::lang_ident::lang_ident | ( | const std::wstring & | cfgfile | ) |
Build a language identifier, read options from given file.
destructor
void freeling::lang_ident::add_language | ( | const std::wstring & | modelfile | ) |
load given language from given model file, add to existing languages.
std::wstring freeling::lang_ident::identify_language | ( | const std::wstring & | text, |
const std::set< std::wstring > & | ls = std::set< std::wstring >() |
||
) | const |
Classify the input text and return the code of the best language among those in given set.
If set is empty all known languages are considered. If no language reaches the threshold, "none" is returned
void freeling::lang_ident::language_perplexities | ( | std::vector< std::pair< double, std::wstring > > & | , |
const std::wstring & | , | ||
const std::set< std::wstring > & | |||
) | const [private] |
fill a vector with unsorted perplexities for each language in given set
void freeling::lang_ident::rank_languages | ( | std::vector< std::pair< double, std::wstring > > & | result, |
const std::wstring & | text, | ||
const std::set< std::wstring > & | ls = std::set< std::wstring >() |
||
) | const |
Classify the input text and return the code and perplexity for each language in given set.
If set is empty, all known languages are considered.
void freeling::lang_ident::train_language | ( | const std::wstring & | textfile, |
const std::wstring & | modelfile, | ||
const std::wstring & | code, | ||
size_t | order | ||
) |
train a model for a language, store in modelFile, and add it to the known languages list.
std::set<std::wstring> freeling::lang_ident::all_known_languages [private] |
std::map<std::wstring,idioma*> freeling::lang_ident::idiomes [private] |
List of known languages and language models.