Class "lang_ident" checks a text against all known languages and sorts the results by probability. More...

#include <lang_ident.h>

Public Member Functions
	lang_ident ()
	Build an empty language identifier.
	lang_ident (const std::wstring &cfgfile)
	Build a language identifier, read options from given file.
	~lang_ident ()
	destructor
void	add_language (const std::wstring &modelfile)
	load given language from given model file, add to existing languages.
void	train_language (const std::wstring &textfile, const std::wstring &modelfile, const std::wstring &code, size_t order)
	train a model for a language, store in modelFile, and add it to the known languages list.
std::wstring	identify_language (const std::wstring &text, const std::set< std::wstring > &ls=std::set< std::wstring >()) const
	Classify the input text and return the code of the best language among those in given set.
void	rank_languages (std::vector< std::pair< double, std::wstring > > &result, const std::wstring &text, const std::set< std::wstring > &ls=std::set< std::wstring >()) const
	Classify the input text and return the code and perplexity for each language in given set.
Private Member Functions
void	language_perplexities (std::vector< std::pair< double, std::wstring > > &, const std::wstring &, const std::set< std::wstring > &) const
	fill a vector with unsorted perplexities for each language in given set
Private Attributes
std::map< std::wstring, idioma * >	idiomes
	List of known languages and language models.
std::set< std::wstring >	all_known_languages

Detailed Description

Class "lang_ident" checks a text against all known languages and sorts the results by probability.

It creates an instance of "idioma" for each known language, and checks input text against all existing instances.

Constructor & Destructor Documentation

freeling::lang_ident::lang_ident ( )

Build an empty language identifier.

freeling::lang_ident::lang_ident ( const std::wstring & cfgfile )

Build a language identifier, read options from given file.

freeling::lang_ident::~lang_ident ( )

destructor

void freeling::lang_ident::add_language ( const std::wstring & modelfile )

load given language from given model file, add to existing languages.

std::wstring freeling::lang_ident::identify_language	(	const std::wstring &	text,
		const std::set< std::wstring > &	ls = `std::set< std::wstring >()`
	)		const

Classify the input text and return the code of the best language among those in given set.

If set is empty all known languages are considered. If no language reaches the threshold, "none" is returned

void freeling::lang_ident::language_perplexities	(	std::vector< std::pair< double, std::wstring > > &	,
		const std::wstring &	,
		const std::set< std::wstring > &
	)		const `[private]`

fill a vector with unsorted perplexities for each language in given set

void freeling::lang_ident::rank_languages	(	std::vector< std::pair< double, std::wstring > > &	result,
		const std::wstring &	text,
		const std::set< std::wstring > &	ls = `std::set< std::wstring >()`
	)		const

Classify the input text and return the code and perplexity for each language in given set.

If set is empty, all known languages are considered.

void freeling::lang_ident::train_language	(	const std::wstring &	textfile,
		const std::wstring &	modelfile,
		const std::wstring &	code,
		size_t	order
	)

train a model for a language, store in modelFile, and add it to the known languages list.

std::map<std::wstring,idioma*> freeling::lang_ident::idiomes [private]

List of known languages and language models.

The documentation for this class was generated from the following file: