Data about how the POS tagger was trained

Submitted by ivansiiito on Tue, 10/16/2018 - 03:23


I was wondering how the POS tagger and lemmatizer was created. I mean: how was it trained, which corpus was used to train it and so on. Is there any article about FreeLing talking about that?


BR Iván Arias Rodríguez

There is no paper about that. It is a standard HMM tagger, pretty similar to TnT (Brants 2000).

The PoS tagger is trained on CoNLL 2009 data (for Spanish, that corresponds to Ancora corpus).

The tagger is trained using the scripts in src/utilities/train-tagger in FreeLing source.