FreeLing  4.0
Public Member Functions | Static Public Member Functions | Static Private Member Functions | Private Attributes
freeling::idioma Class Reference

Class "idioma" implements a visible Markov's model that calculates the probability that a text is in a certain language. More...

#include <idioma.h>

Collaboration diagram for freeling::idioma:
Collaboration graph
[legend]

List of all members.

Public Member Functions

 idioma (const std::wstring &)
 Constructor, given the model file to load.
 ~idioma ()
double sequence_probability (std::wistream &, size_t &) const
 Calculates the probability that the text is in the instance language.
double compute_perplexity (const std::wstring &) const
 Compute normalized language probability for given string.
std::wstring get_language_code () const
 get iso code for current language
double get_threshold () const
 get maximum allowed perplexity

Static Public Member Functions

static void create_model (const std::wstring &modelFile, std::wistream &f, const std::wstring &code, int order, wchar_t phantom)
 Use given text file to count ngrams and create a model file.

Static Private Member Functions

static std::wstring to_writable (wchar_t c)
 convert a char to a writable represntation for the model file
static std::wstring to_writable (const std::wstring &)
 convert a ngram to a writable represntation for the model file
static void initial_ngram (std::wistream &f, std::wstring &ngram, wchar_t &z, int ord, wchar_t ph)
 Initial ngram: n-1 phantom characters plus the first actual letter.
static void next_ngram (std::wistream &f, std::wstring &ngram, wchar_t &z)
 slide ngram window one position to the left

Private Attributes

std::wstring LangCode
std::map< std::wstring, double > count
 auxiliary for training
wchar_t phantom
 char to use to create initial state n-gram
int order
 order of ngram model
double threshold
 maximum perplexity to accept a sequence
smoothingLD< std::wstring,
wchar_t > * 
smooth

Detailed Description

Class "idioma" implements a visible Markov's model that calculates the probability that a text is in a certain language.


Constructor & Destructor Documentation

freeling::idioma::idioma ( const std::wstring &  )

Constructor, given the model file to load.


Member Function Documentation

double freeling::idioma::compute_perplexity ( const std::wstring &  ) const

Compute normalized language probability for given string.

static void freeling::idioma::create_model ( const std::wstring &  modelFile,
std::wistream &  f,
const std::wstring &  code,
int  order,
wchar_t  phantom 
) [static]

Use given text file to count ngrams and create a model file.

std::wstring freeling::idioma::get_language_code ( ) const

get iso code for current language

get maximum allowed perplexity

static void freeling::idioma::initial_ngram ( std::wistream &  f,
std::wstring &  ngram,
wchar_t &  z,
int  ord,
wchar_t  ph 
) [static, private]

Initial ngram: n-1 phantom characters plus the first actual letter.

static void freeling::idioma::next_ngram ( std::wistream &  f,
std::wstring &  ngram,
wchar_t &  z 
) [static, private]

slide ngram window one position to the left

double freeling::idioma::sequence_probability ( std::wistream &  ,
size_t &   
) const

Calculates the probability that the text is in the instance language.

static std::wstring freeling::idioma::to_writable ( wchar_t  c) [static, private]

convert a char to a writable represntation for the model file

static std::wstring freeling::idioma::to_writable ( const std::wstring &  ) [static, private]

convert a ngram to a writable represntation for the model file


Member Data Documentation

std::map<std::wstring,double> freeling::idioma::count [private]

auxiliary for training

std::wstring freeling::idioma::LangCode [private]

order of ngram model

wchar_t freeling::idioma::phantom [private]

char to use to create initial state n-gram

smoothingLD<std::wstring,wchar_t>* freeling::idioma::smooth [private]
double freeling::idioma::threshold [private]

maximum perplexity to accept a sequence


The documentation for this class was generated from the following file: