FreeLing
4.0
|
Class probabilities sets lexical probabilities for each PoS tag of each word in a sentence. More...
#include <probabilities.h>
Public Member Functions | |
probabilities (const std::wstring &, double) | |
Constructor. | |
~probabilities () | |
Destructor. | |
void | annotate_word (word &) const |
Assign probabilities for each analysis of given word. | |
void | set_activate_guesser (bool) |
Turn guesser on/of. | |
void | analyze (sentence &) const |
Assign probabilities to tags for each word in sentence. | |
Private Member Functions | |
void | smoothing (word &) const |
Smooth probabilities for the analysis of given word. | |
double | compute_probability (const std::wstring &, double, const std::wstring &) const |
Compute p(tag|suffix) using recursively shorter suffixes. | |
double | guesser (word &, double) const |
Guess possible tags, keeping some mass for previously assigned tags. | |
bool | less (const analysis &a1, const analysis &a2) const |
compare two analysis to set the right order of preference | |
void | sort_list (std::list< analysis > &ls) const |
sort given analysis list using lemma and pos preferences | |
Private Attributes | |
freeling::regexp | RE_PunctNum |
Auxiliary regexps. | |
double | ProbabilityThreshold |
Probability threshold for unknown words tags. | |
const tagset * | Tags |
Tagset description, to compute short versions of tags. | |
double | BiassSuffixes |
Interpolation factor to favor suffix probabilities versus ambiguity-class probabilities when smoothing known but unobserved words. | |
double | LidstoneLambdaLexical |
lambda parameter for smoothing via Lidstone's Law | |
double | LidstoneLambdaClass |
bool | activate_guesser |
whether to use guesser for unknown words. | |
std::map< std::wstring, double > | single_tags |
unigram probabilities | |
std::map< std::wstring, std::map< std::wstring, double > > | class_tags |
probabilities for usual ambiguity classes | |
std::map< std::wstring, std::map< std::wstring, double > > | lexical_tags |
lexical probabilities for frequent words | |
std::map< std::wstring, double > | unk_tags |
list of tags and probabilities to assign to unknown words | |
std::map< std::wstring, std::map< std::wstring, double > > | unk_suffs |
list of tag frequencies for unknown word suffixes | |
double | theeta |
unknown words suffix smoothing parameter; | |
std::wstring::size_type | long_suff |
length of longest suffix | |
std::map< std::wstring, std::wstring > | lemma_prefs |
std::map< std::wstring, std::wstring > | pos_prefs |
Class probabilities sets lexical probabilities for each PoS tag of each word in a sentence.
freeling::probabilities::probabilities | ( | const std::wstring & | probFile, |
double | Threshold | ||
) |
Constructor.
Create a probability assignation module, loading appropriate file.
References freeling::util::absolute(), activate_guesser, freeling::config_file::add_section(), BiassSuffixes, class_tags, freeling::config_file::close(), ERROR_CRASH, freeling::config_file::get_content_line(), freeling::config_file::get_section(), lemma_prefs, lexical_tags, LidstoneLambdaClass, LidstoneLambdaLexical, long_suff, freeling::config_file::open(), pos_prefs, ProbabilityThreshold, single_tags, Tags, theeta, TRACE, unk_suffs, unk_tags, and wstring2double.
Destructor.
References Tags.
void freeling::probabilities::analyze | ( | sentence & | se | ) | const [virtual] |
Assign probabilities to tags for each word in sentence.
Annotate probabilities for each analysis of each word in given sentence, using given options.
Implements freeling::processor.
References annotate_word(), and TRACE_SENTENCE.
void freeling::probabilities::annotate_word | ( | word & | w | ) | const |
Assign probabilities for each analysis of given word.
Annotate probabilities for each analysis of given word.
References activate_guesser, freeling::word::find_tag_match(), freeling::word::found_in_dict(), freeling::word::get_form(), freeling::word::get_n_analysis(), guesser(), freeling::word::has_retokenizable(), RE_PunctNum, freeling::word::select_all_analysis(), smoothing(), sort_list(), and TRACE.
Referenced by analyze().
double freeling::probabilities::compute_probability | ( | const std::wstring & | tag, |
double | prob, | ||
const std::wstring & | s | ||
) | const [private] |
Compute p(tag|suffix) using recursively shorter suffixes.
Compute probability of a tag given a word suffix.
References double2wstring, theeta, TRACE, and unk_suffs.
Referenced by guesser(), and smoothing().
double freeling::probabilities::guesser | ( | word & | w, |
double | mass | ||
) | const [private] |
Guess possible tags, keeping some mass for previously assigned tags.
References freeling::word::add_analysis(), compute_probability(), double2wstring, freeling::word::get_lc_form(), freeling::word::get_n_analysis(), freeling::tagset::get_short_tag(), freeling::analysis::init(), ProbabilityThreshold, freeling::word::set_analysis(), freeling::analysis::set_prob(), Tags, TRACE, and unk_tags.
Referenced by annotate_word().
bool freeling::probabilities::less | ( | const analysis & | a1, |
const analysis & | a2 | ||
) | const [private] |
compare two analysis to set the right order of preference
References freeling::analysis::get_lemma(), freeling::analysis::get_prob(), freeling::analysis::get_tag(), lemma_prefs, and pos_prefs.
Referenced by sort_list().
void freeling::probabilities::set_activate_guesser | ( | bool | b | ) |
void freeling::probabilities::smoothing | ( | word & | w | ) | const [private] |
Smooth probabilities for the analysis of given word.
if using backoff, combine with suffix information to get better estimation
References BiassSuffixes, class_tags, compute_probability(), freeling::word::get_form(), freeling::word::get_lc_form(), freeling::word::get_n_analysis(), freeling::tagset::get_short_tag(), lexical_tags, LidstoneLambdaClass, LidstoneLambdaLexical, single_tags, Tags, and TRACE.
Referenced by annotate_word().
void freeling::probabilities::sort_list | ( | std::list< analysis > & | ls | ) | const [private] |
sort given analysis list using lemma and pos preferences
bubble sort given analysis list using given preferences
References less().
Referenced by annotate_word().
bool freeling::probabilities::activate_guesser [private] |
whether to use guesser for unknown words.
Referenced by annotate_word(), probabilities(), and set_activate_guesser().
double freeling::probabilities::BiassSuffixes [private] |
Interpolation factor to favor suffix probabilities versus ambiguity-class probabilities when smoothing known but unobserved words.
Referenced by probabilities(), and smoothing().
std::map<std::wstring,std::map<std::wstring,double> > freeling::probabilities::class_tags [private] |
probabilities for usual ambiguity classes
Referenced by probabilities(), and smoothing().
std::map<std::wstring,std::wstring> freeling::probabilities::lemma_prefs [private] |
Referenced by less(), and probabilities().
std::map<std::wstring,std::map<std::wstring,double> > freeling::probabilities::lexical_tags [private] |
lexical probabilities for frequent words
Referenced by probabilities(), and smoothing().
double freeling::probabilities::LidstoneLambdaClass [private] |
Referenced by probabilities(), and smoothing().
double freeling::probabilities::LidstoneLambdaLexical [private] |
lambda parameter for smoothing via Lidstone's Law
Referenced by probabilities(), and smoothing().
std::wstring::size_type freeling::probabilities::long_suff [private] |
length of longest suffix
Referenced by probabilities().
std::map<std::wstring,std::wstring> freeling::probabilities::pos_prefs [private] |
Referenced by less(), and probabilities().
double freeling::probabilities::ProbabilityThreshold [private] |
Probability threshold for unknown words tags.
Referenced by guesser(), and probabilities().
Auxiliary regexps.
Referenced by annotate_word().
std::map<std::wstring,double> freeling::probabilities::single_tags [private] |
unigram probabilities
Referenced by probabilities(), and smoothing().
const tagset* freeling::probabilities::Tags [private] |
Tagset description, to compute short versions of tags.
Referenced by guesser(), probabilities(), smoothing(), and ~probabilities().
double freeling::probabilities::theeta [private] |
unknown words suffix smoothing parameter;
Referenced by compute_probability(), and probabilities().
std::map<std::wstring,std::map<std::wstring,double> > freeling::probabilities::unk_suffs [private] |
list of tag frequencies for unknown word suffixes
Referenced by compute_probability(), and probabilities().
std::map<std::wstring,double> freeling::probabilities::unk_tags [private] |
list of tags and probabilities to assign to unknown words
Referenced by guesser(), and probabilities().