FreeLing
4.0
|
Class foma_FSM is a wrapper for the FOMA library, for the specific use of getting entries from a dictionary with minimum edit distance to given key. More...
#include <foma_FSM.h>
Public Member Functions | |
foma_FSM (const std::wstring &, const std::wstring &mcost=L"", const std::list< std::wstring > &joins=std::list< std::wstring >()) | |
build regular automaton from text file, optional cost matrix, join chars if it is a compound FSA | |
foma_FSM (std::wistream &, const std::wstring &mcost=L"", const std::list< std::wstring > &joins=std::list< std::wstring >()) | |
build regular automaton from text buffer, optional cost matrix, join chars if it is a compound FSA | |
~foma_FSM () | |
clear | |
void | get_similar_words (const std::wstring &, std::list< std::pair< std::wstring, int > > &) const |
Use automata to obtain closest matches to given form, and add them to given list. | |
void | set_cutoff_threshold (int) |
set maximum edit distance of desired results | |
void | set_num_matches (int) |
set maximum number of desired results | |
void | set_basic_operation_cost (int) |
Set cost for basic SED operations (insert, delete, substitute) | |
void | set_operation_cost (const std::wstring &, const std::wstring &, int) |
Set cost for a particular SED operation (replace "in" with "out") | |
std::set< std::wstring > | get_alphabet () |
get FSM alphabet | |
Private Member Functions | |
struct fsm * | load_dictionary_file (const std::wstring &fname) const |
Auxiliary for constructors: create a FSM loading a file. | |
struct fsm * | load_dictionary_buffer (std::wistream &buff) const |
Auxiliary for constructors: create a FSM loading a text buffer. | |
void | load_cost_matrix (const std::wstring &mcost) |
Auxiliary for constructors: Load cost matrix. | |
void | create_compound_FSA (const std::list< std::wstring > &joins) |
Auxiliary for constructors: create a compound-detector FSM. | |
void | init_MED () |
void | complete_alphabet (const std::wstring &) |
Auxiliary for constructor: complete FSM alphabet with missing symbols from cost matrix. | |
void | update_FSM_alphabet (const std::set< std::wstring > &) |
Private Attributes | |
struct fsm * | fsa |
foma automaton | |
struct apply_med_handle * | h_fsa |
Handle for foma minimum edit distance automaton. |
Class foma_FSM is a wrapper for the FOMA library, for the specific use of getting entries from a dictionary with minimum edit distance to given key.
freeling::foma_FSM::foma_FSM | ( | const std::wstring & | , |
const std::wstring & | mcost = L"" , |
||
const std::list< std::wstring > & | joins = std::list< std::wstring >() |
||
) |
build regular automaton from text file, optional cost matrix, join chars if it is a compound FSA
freeling::foma_FSM::foma_FSM | ( | std::wistream & | , |
const std::wstring & | mcost = L"" , |
||
const std::list< std::wstring > & | joins = std::list< std::wstring >() |
||
) |
build regular automaton from text buffer, optional cost matrix, join chars if it is a compound FSA
clear
Destructor, free foma structs.
void freeling::foma_FSM::complete_alphabet | ( | const std::wstring & | ) | [private] |
Auxiliary for constructor: complete FSM alphabet with missing symbols from cost matrix.
Auxiliary for constructor: Complete FSM alphabet with any symbol in cost matrix it may be missing.
References ERROR_CRASH, int2wstring, freeling::print_sigma(), set2wstring, and TRACE.
void freeling::foma_FSM::create_compound_FSA | ( | const std::list< std::wstring > & | joins | ) | [private] |
Auxiliary for constructors: create a compound-detector FSM.
set< wstring > freeling::foma_FSM::get_alphabet | ( | ) |
get FSM alphabet
References string2wstring.
void freeling::foma_FSM::get_similar_words | ( | const std::wstring & | , |
std::list< std::pair< std::wstring, int > > & | |||
) | const |
Use automata to obtain closest matches to given form, and add them to given list.
Use automata to obtain closest matches to given form, adding them (and the distance) to given list.
References string2wstring, TRACE, and wstring2string.
Referenced by freeling::compounds::check_compound(), and freeling::alternatives::get_similar_words().
void freeling::foma_FSM::init_MED | ( | ) | [private] |
void freeling::foma_FSM::load_cost_matrix | ( | const std::wstring & | mcost | ) | [private] |
Auxiliary for constructors: Load cost matrix.
Auxiliary for constructors: Load cost matrix, completing alphabet if necessary.
References freeling::print_sigma(), TRACE, and wstring2string.
struct fsm * freeling::foma_FSM::load_dictionary_buffer | ( | std::wistream & | buff | ) | const [read, private] |
Auxiliary for constructors: create a FSM loading a text buffer.
Auxiliary for constructors: create a FSM from a text buffer.
References TRACE, and wstring2string.
struct fsm * freeling::foma_FSM::load_dictionary_file | ( | const std::wstring & | fname | ) | const [read, private] |
Auxiliary for constructors: create a FSM loading a file.
----------------- Private methods ----------------------------
References ERROR_CRASH, TRACE, and wstring2string.
void freeling::foma_FSM::set_basic_operation_cost | ( | int | cost | ) |
Set cost for basic SED operations (insert, delete, substitute)
Set cost for basic SED operations to given value.
void freeling::foma_FSM::set_cutoff_threshold | ( | int | thr | ) |
set maximum edit distance of desired results
Set maximum edit distance to retrieve.
Referenced by freeling::alternatives::alternatives().
void freeling::foma_FSM::set_num_matches | ( | int | max | ) |
set maximum number of desired results
Set maximum number of matches to retrieve.
void freeling::foma_FSM::set_operation_cost | ( | const std::wstring & | , |
const std::wstring & | , | ||
int | |||
) |
Set cost for a particular SED operation (replace "in" with "out")
Set cost for a particular SED operation.
References wstring2string.
void freeling::foma_FSM::update_FSM_alphabet | ( | const std::set< std::wstring > & | ) | [private] |
References set2wstring, and wstring2string.
struct fsm* freeling::foma_FSM::fsa [private] |
foma automaton
struct apply_med_handle* freeling::foma_FSM::h_fsa [private] |
Handle for foma minimum edit distance automaton.