FreeLing
4.0
|
The class hmm_tagger implements the syntactic analyzer and is the main class, which uses all the others. More...
#include <hmm_tagger.h>
Public Member Functions | |
hmm_tagger (const std::wstring &, bool, unsigned int, unsigned int kb=1) | |
Constructor. | |
~hmm_tagger () | |
Destructor. | |
void | annotate (sentence &) const |
analyze given sentence | |
double | SequenceProb_log (const sentence &, int k=0) const |
Given an *annotated* sentence, compute (log) probability of k-th best sequence according to HMM parameters. | |
Private Member Functions | |
bool | is_forbidden (const std::wstring &, sentence::const_iterator) const |
check if a trigram is in the forbidden list. | |
double | ProbA_log (const bigram &, const bigram &, sentence::const_iterator) const |
Compute transition log_probability from state_i to state_j, returning appropriate smoothed values if no evidence is available. | |
double | ProbB_log (const bigram &, const word &) const |
Compute emission log_probability for observation obs from state_i. | |
double | ProbPi_log (const bigram &) const |
Compute initial log_probability for state_i. | |
std::list< emission_states > | FindStates (const sentence &) const |
compute possible emission states for each word in sentence. | |
Private Attributes | |
const tagset * | Tags |
std::map< std::wstring, double > | PTag |
maps to store the probabilities | |
std::map< bigram, double > | PBg |
std::map< std::wstring, double > | PTrg |
std::map< bigram, double > | PInitial |
std::map< std::wstring, double > | PWord |
std::multimap< std::wstring, std::wstring > | Forbidden |
set of hand-specified forbidden bigram and trigram transitions | |
double | probInitial |
double | probUnobserved |
safe_map< std::wstring, double > * | pA_cache |
thread-safe probabilitiy cache, to speed up computations | |
unsigned int | kbest |
number of best paths to compute | |
double | c [3] |
coeficients to compute linear interpolation |
The class hmm_tagger implements the syntactic analyzer and is the main class, which uses all the others.
freeling::hmm_tagger::hmm_tagger | ( | const std::wstring & | hmmFile, |
bool | rtk, | ||
unsigned int | force, | ||
unsigned int | kb = 1 |
||
) |
Constructor.
Constructor: Build a HMM tagger, loading probability tables.
References freeling::util::absolute(), freeling::config_file::add_section(), c, freeling::config_file::close(), ERROR_CRASH, Forbidden, freeling::config_file::get_content_line(), freeling::config_file::get_section(), freeling::tagset::get_short_tag(), kbest, freeling::config_file::open(), pA_cache, PBg, PInitial, probInitial, probUnobserved, PTag, PTrg, PWord, Tags, TRACE, freeling::UNOBS_INITIAL_STATE, freeling::UNOBS_WORD, vector2wstring, WARNING, freeling::semgraph::WORD, and wstring2vector.
void freeling::hmm_tagger::annotate | ( | sentence & | se | ) | const [virtual] |
analyze given sentence
Disambiguate given sentences with provided options.
Implements freeling::POS_tagger.
References double2wstring, FindStates(), freeling::tagset::get_short_tag(), int2wstring, kbest, ProbA_log(), ProbB_log(), ProbPi_log(), Tags, TRACE, and freeling::trellis::ZERO_logprob.
list< emission_states > freeling::hmm_tagger::FindStates | ( | const sentence & | sent | ) | const [private] |
compute possible emission states for each word in sentence.
Obtain a list with the states that *may* have emmited current observation (a sentence).
References freeling::tagset::get_short_tag(), Tags, and TRACE.
Referenced by annotate().
bool freeling::hmm_tagger::is_forbidden | ( | const std::wstring & | , |
sentence::const_iterator | |||
) | const [private] |
check if a trigram is in the forbidden list.
References Forbidden, freeling::tagset::get_short_tag(), Tags, TRACE, vector2wstring, and wstring2vector.
Referenced by ProbA_log().
double freeling::hmm_tagger::ProbA_log | ( | const bigram & | state_i, |
const bigram & | state_j, | ||
sentence::const_iterator | w | ||
) | const [private] |
Compute transition log_probability from state_i to state_j, returning appropriate smoothed values if no evidence is available.
If the trigram is in the "forbidden" list, result is probability zero.
References c, double2wstring, safe_map< T1, T2 >::find_safe(), safe_map< T1, T2 >::insert_safe(), is_forbidden(), pA_cache, PBg, PTag, PTrg, and TRACE.
Referenced by annotate(), and SequenceProb_log().
double freeling::hmm_tagger::ProbB_log | ( | const bigram & | state_i, |
const word & | obs | ||
) | const [private] |
Compute emission log_probability for observation obs from state_i.
Pb=P(word|state)=P(state|word)*P(word)/P(state) Since states are bigrams: s=t1.t2
References double2wstring, freeling::word::get_lc_form(), freeling::tagset::get_short_tag(), probUnobserved, PTag, PWord, Tags, and TRACE.
Referenced by annotate(), and SequenceProb_log().
double freeling::hmm_tagger::ProbPi_log | ( | const bigram & | state_i | ) | const [private] |
Compute initial log_probability for state_i.
References PInitial, probInitial, and freeling::trellis::ZERO_logprob.
Referenced by annotate(), and SequenceProb_log().
double freeling::hmm_tagger::SequenceProb_log | ( | const sentence & | se, |
int | k = 0 |
||
) | const |
Given an *annotated* sentence, compute (log) probability of k-th best sequence according to HMM parameters.
Given an *annotated* sentence, compute sequence (log) probability according to HMM parameters.
References freeling::tagset::get_short_tag(), ProbA_log(), ProbB_log(), ProbPi_log(), and Tags.
double freeling::hmm_tagger::c[3] [private] |
coeficients to compute linear interpolation
Referenced by hmm_tagger(), and ProbA_log().
std::multimap<std::wstring, std::wstring> freeling::hmm_tagger::Forbidden [private] |
set of hand-specified forbidden bigram and trigram transitions
Referenced by hmm_tagger(), and is_forbidden().
unsigned int freeling::hmm_tagger::kbest [private] |
number of best paths to compute
Referenced by annotate(), and hmm_tagger().
safe_map<std::wstring,double>* freeling::hmm_tagger::pA_cache [private] |
thread-safe probabilitiy cache, to speed up computations
Referenced by hmm_tagger(), ProbA_log(), and ~hmm_tagger().
std::map<bigram, double> freeling::hmm_tagger::PBg [private] |
Referenced by hmm_tagger(), and ProbA_log().
std::map<bigram, double> freeling::hmm_tagger::PInitial [private] |
Referenced by hmm_tagger(), and ProbPi_log().
double freeling::hmm_tagger::probInitial [private] |
Referenced by hmm_tagger(), and ProbPi_log().
double freeling::hmm_tagger::probUnobserved [private] |
Referenced by hmm_tagger(), and ProbB_log().
std::map<std::wstring, double> freeling::hmm_tagger::PTag [private] |
maps to store the probabilities
Referenced by hmm_tagger(), ProbA_log(), and ProbB_log().
std::map<std::wstring, double> freeling::hmm_tagger::PTrg [private] |
Referenced by hmm_tagger(), and ProbA_log().
std::map<std::wstring, double> freeling::hmm_tagger::PWord [private] |
Referenced by hmm_tagger(), and ProbB_log().
const tagset* freeling::hmm_tagger::Tags [private] |
Referenced by annotate(), FindStates(), hmm_tagger(), is_forbidden(), ProbB_log(), SequenceProb_log(), and ~hmm_tagger().