Forums
Hi!
I was wondering how the POS tagger and lemmatizer was created. I mean: how was it trained, which corpus was used to train it and so on. Is there any article about FreeLing talking about that?
Thanks!
BR Iván Arias Rodríguez
There is no paper about that…
There is no paper about that. It is a standard HMM tagger, pretty similar to TnT (Brants 2000).
The PoS tagger is trained on CoNLL 2009 data (for Spanish, that corresponds to Ancora corpus).
The tagger is trained using the scripts in src/utilities/train-tagger in FreeLing source.