Retokenization issue

Submitted by mgl on Thu, 02/13/2020 - 20:41
Forums

Hi,

I'm experiencing an issue with the analyser. When I want to analyze the given text:

"E
después
leuanto
se
dela
oracion

acometio
co
cinquenta
cavalleros
solos
a
ochenta
mil
enemigos"

The result I get is:

"E e CC 0.494868
después después RG 1
leuanto levantar VMIP1S0 1
se se P00CN00 0.489655
dela dela_de+dela_el SP+DA 0.576647
oracion oración NCFS000 1
⁊ ⁊ Fz 1
acometio acometio VMIS3S0 1
co cu NCFS000 1
cinquenta 50 Z 1
cavalleros caballero NCMP000 0.621109
solos solo AQ0MP00 0.846154
a a SP 0.956937
ochenta_mil 80000 Z 1
enemigos enemigo NCMP000 0.65625"

The command I use is:

analyze -f es-old.cfg --nortk --nortkcon --input freeling --inplv splitted < $in > $out

ochenta mil is retokenized as ochenta_mil. I need freeling not to retokenise my files to process them after...

I'm using Freeling 4.1.

Did I miss something ?

Sorry for the late answer, I missed your post.

Unfortunately, number detection module does not have the option to output numbers separately.  You can deactivate it with --nonumb, but then the number names will be recognized probably as nouns or adjectives.