FreeLing
4.0
|
Class splitter implements a sentence splitter, which accumulates lists of words until a sentence is completed, and then returns a list of sentence objects. More...
#include <splitter.h>
Classes | |
class | splitter_status |
Public Types | |
typedef splitter_status * | session_id |
Public Member Functions | |
splitter (const std::wstring &splitfile) | |
Constructor, given option file. | |
~splitter () | |
Destructor. | |
session_id | open_session () const |
open a splitting session, get session id | |
void | close_session (session_id ses) const |
close splitting session | |
void | split (session_id ses, const std::list< word > &lw, bool flush, std::list< sentence > &ls) const |
Add given list of words to the buffer, and put complete sentences that can be build into ls. | |
std::list< sentence > | split (session_id ses, const std::list< word > &ls, bool flush) const |
same than previous method, but result sentences are returned. | |
void | split (const std::list< word > &lw, std::list< sentence > &ls) const |
Sessionless splitting. | |
std::list< sentence > | split (const std::list< word > &ls) const |
Sessionless splitting, return a copy of the sentences. | |
Private Member Functions | |
bool | end_of_sentence (std::list< word >::const_iterator, const std::list< word > &) const |
check for sentence markers | |
Private Attributes | |
bool | SPLIT_AllowBetweenMarkers |
configuration options | |
int | SPLIT_MaxWords |
std::set< std::wstring > | starters |
Sentence delimiters. | |
std::map< std::wstring, bool > | enders |
std::map< std::wstring, int > | markers |
Open-close marker pairs (parenthesis, etc) |
Class splitter implements a sentence splitter, which accumulates lists of words until a sentence is completed, and then returns a list of sentence objects.
freeling::splitter::splitter | ( | const std::wstring & | splitfile | ) |
Constructor, given option file.
Create a sentence splitter.
References freeling::config_file::add_section(), freeling::config_file::close(), ERROR_CRASH, freeling::config_file::get_content_line(), freeling::config_file::get_section(), freeling::config_file::open(), SAME, and TRACE.
Destructor.
Desctructor.
void freeling::splitter::close_session | ( | session_id | ses | ) | const |
bool freeling::splitter::end_of_sentence | ( | std::list< word >::const_iterator | , |
const std::list< word > & | |||
) | const [private] |
check for sentence markers
Check whether a word is a sentence end (eg a dot followed by a capitalized word).
open a splitting session, get session id
Open a session, and create a copy of the internal status for it.
Sessions are needed in case the same splitter is used to split different files simultaneously (either by the same thread or by different threads
References freeling::splitter::splitter_status::betweenMrk, int2wstring, freeling::splitter::splitter_status::no_split_count, freeling::splitter::splitter_status::nsentence, and TRACE.
void freeling::splitter::split | ( | session_id | ses, |
const std::list< word > & | lw, | ||
bool | flush, | ||
std::list< sentence > & | ls | ||
) | const |
Add given list of words to the buffer, and put complete sentences that can be build into ls.
The boolean states if a buffer flush has to be forced (true) or some words may remain in the buffer (false) if the splitter needs to wait to see what is coming next. Each thread using the same splitter needs to open a new session.
std::list<sentence> freeling::splitter::split | ( | session_id | ses, |
const std::list< word > & | ls, | ||
bool | flush | ||
) | const |
same than previous method, but result sentences are returned.
void freeling::splitter::split | ( | const std::list< word > & | lw, |
std::list< sentence > & | ls | ||
) | const |
Sessionless splitting.
Fill given list<sentece>
std::list< sentence > freeling::splitter::split | ( | const std::list< word > & | ls | ) | const |
Sessionless splitting, return a copy of the sentences.
std::map<std::wstring,bool> freeling::splitter::enders [private] |
std::map<std::wstring,int> freeling::splitter::markers [private] |
Open-close marker pairs (parenthesis, etc)
bool freeling::splitter::SPLIT_AllowBetweenMarkers [private] |
configuration options
int freeling::splitter::SPLIT_MaxWords [private] |
std::set<std::wstring> freeling::splitter::starters [private] |
Sentence delimiters.