FreeLing
4.0
|
Class alternatives suggests words that are orthogrphically/phonetically similar to input word. More...
#include <alternatives.h>
Public Member Functions | |
alternatives (const std::wstring &) | |
Constructor. | |
~alternatives () | |
Destructor. | |
void | get_similar_words (const std::wstring &, std::list< std::pair< std::wstring, int > > &) const |
direct access to results of underlying automata | |
void | analyze (sentence &) const |
spell check each word in sentence | |
Private Member Functions | |
void | filter_candidate (const std::wstring &, const std::wstring &, int distance, std::map< std::wstring, int > &) const |
filter given candidate and decide if it is a valid alternative. | |
void | filter_alternatives (const std::list< std::pair< std::wstring, int > > &, word &) const |
adds the new words that are posible correct spellings from original word to the word analysys data | |
std::list< std::wstring > | recover_words (std::list< std::wstring > wds) const |
retrieve all possible word sequence that match (one-to-one) given sound sequence | |
Private Attributes | |
foma_FSM * | sed |
FSM for orthographic/phonetic edit distance. | |
foma_FSM * | comp |
FSM for orthographic/phonetic compound analysis. | |
std::multimap< std::wstring, std::wstring > | orthography |
remember from which word(s) every phonetic form came from (only for phonetic distances) | |
phonetics * | ph |
The class that translates a word into phonetic sounds. | |
int | DistanceThreshold |
Maximum distance to consider an entry as an alternative. | |
int | MaxSizeDiff |
Maximum lentgh difference to consider a word as a possible correction. | |
freeling::regexp | CheckKnownTags |
tags of known word to be be checked | |
bool | CheckUnknown |
whether unknown words should be checked | |
int | DistanceType |
Static Private Attributes | |
static const int | ORTHOGRAPHIC = 1 |
type of distance used | |
static const int | PHONETIC = 2 |
Class alternatives suggests words that are orthogrphically/phonetically similar to input word.
Results may be used for spell checking.
freeling::alternatives::alternatives | ( | const std::wstring & | altsFile | ) |
Constructor.
Create a alternatives module, loading dictionary and options.
Create phonetic transcriptor
References freeling::util::absolute(), freeling::config_file::add_section(), CheckKnownTags, CheckUnknown, freeling::config_file::close(), comp, DistanceThreshold, DistanceType, ERROR_CRASH, freeling::config_file::get_content_line(), freeling::config_file::get_section(), freeling::phonetics::get_sound(), freeling::util::lowercase(), MaxSizeDiff, freeling::util::new_tempfile_name(), freeling::config_file::open(), freeling::util::open_utf8_file(), ORTHOGRAPHIC, orthography, ph, PHONETIC, sed, freeling::foma_FSM::set_cutoff_threshold(), TRACE, WARNING, wstring2int, and wstring2string.
void freeling::alternatives::analyze | ( | sentence & | se | ) | const [virtual] |
spell check each word in sentence
Navigates the sentence adding alternative words (possible correct spelling data)
Implements freeling::processor.
References CheckKnownTags, CheckUnknown, filter_alternatives(), get_similar_words(), int2wstring, freeling::regexp::search(), and TRACE.
void freeling::alternatives::filter_alternatives | ( | const std::list< std::pair< std::wstring, int > > & | , |
word & | |||
) | const [private] |
adds the new words that are posible correct spellings from original word to the word analysys data
adds the new words that are valid alternatives.
auxiliary list to sort alternatives by edit distance + length difference
References freeling::word::alternatives_begin(), freeling::word::alternatives_end(), freeling::word::clear_alternatives(), freeling::compare_alternatives(), DistanceThreshold, DistanceType, filter_candidate(), freeling::word::get_alternatives(), freeling::word::get_form(), freeling::word::get_lc_form(), PHONETIC, and TRACE.
Referenced by analyze().
void freeling::alternatives::filter_candidate | ( | const std::wstring & | , |
const std::wstring & | , | ||
int | distance, | ||
std::map< std::wstring, int > & | |||
) | const [private] |
filter given candidate and decide if it is a valid alternative.
References int2wstring, MaxSizeDiff, and TRACE.
Referenced by filter_alternatives().
void freeling::alternatives::get_similar_words | ( | const std::wstring & | , |
std::list< std::pair< std::wstring, int > > & | |||
) | const |
direct access to results of underlying automata
Provide direct access to results of underlying automata, in case caller only want the list of strings.
References comp, DistanceType, freeling::foma_FSM::get_similar_words(), freeling::phonetics::get_sound(), MaxSizeDiff, ORTHOGRAPHIC, ph, PHONETIC, recover_words(), sed, TRACE, and wstring2list.
Referenced by analyze().
list< wstring > freeling::alternatives::recover_words | ( | std::list< std::wstring > | wds | ) | const [private] |
retrieve all possible word sequence that match (one-to-one) given sound sequence
Given a list of sounds, retrieve all possible lists of words that match them (one-to-one)
References orthography.
Referenced by get_similar_words().
tags of known word to be be checked
Referenced by alternatives(), and analyze().
bool freeling::alternatives::CheckUnknown [private] |
whether unknown words should be checked
Referenced by alternatives(), and analyze().
foma_FSM* freeling::alternatives::comp [private] |
FSM for orthographic/phonetic compound analysis.
Referenced by alternatives(), and get_similar_words().
int freeling::alternatives::DistanceThreshold [private] |
Maximum distance to consider an entry as an alternative.
Referenced by alternatives(), and filter_alternatives().
int freeling::alternatives::DistanceType [private] |
Referenced by alternatives(), filter_alternatives(), and get_similar_words().
int freeling::alternatives::MaxSizeDiff [private] |
Maximum lentgh difference to consider a word as a possible correction.
Referenced by alternatives(), filter_candidate(), and get_similar_words().
const int freeling::alternatives::ORTHOGRAPHIC = 1 [static, private] |
type of distance used
Referenced by alternatives(), and get_similar_words().
std::multimap<std::wstring,std::wstring> freeling::alternatives::orthography [private] |
remember from which word(s) every phonetic form came from (only for phonetic distances)
Referenced by alternatives(), and recover_words().
phonetics* freeling::alternatives::ph [private] |
The class that translates a word into phonetic sounds.
Referenced by alternatives(), get_similar_words(), and ~alternatives().
const int freeling::alternatives::PHONETIC = 2 [static, private] |
Referenced by alternatives(), filter_alternatives(), and get_similar_words().
foma_FSM* freeling::alternatives::sed [private] |
FSM for orthographic/phonetic edit distance.
Referenced by alternatives(), get_similar_words(), and ~alternatives().