NER and IOB sequential classification

Submitted by flopezbello on Sun, 11/20/2016 - 19:27


Consider the following statement:
"La sentencia dictada por la Sra. Juez Letrado de Primera Instancia de Montevideo"

When you use Freeling to parse for NER/NEC, you get:

1 La el DA0FS0 DA pos=determiner|type=article|gen=feminine|num=singular - - - - - - -
2 sentencia sentencia NCFS000 NC pos=noun|type=common|gen=feminine|num=singular - - - - - - -
3 dictada dictar VMP00SF VMP pos=verb|type=main|mood=participle|num=singular|gen=feminine - - - - - - -
4 por por SP SP pos=adposition|type=preposition - - - - - - -
5 la el DA0FS0 DA pos=determiner|type=article|gen=feminine|num=singular - - - - - - -
6 Sra._Juez_Letrado_de_Primera_Instancia_de_el_Chuy sra._juez_letrado_de_primera_instancia_de_el_chuy NP00O00 NP pos=noun|type=proper|neclass=organization B-ORG - - - - - -

which is not correct for line 6. One would expect, for line 6, something like:

Sra. B-PER
Juez I-PER
Letrado I-PER
de I-PER
Primera I-PER
Instancia I-PER
Montevideo B-LOC

I've been trying to tune configuration files tw*dat and gen*dat with no luck.

Any ideas? Am I missing something?


Freeling is a library which produces a data structure as a result.
"analyzer" program is just a sample of how to use this library. This program has many options and output formats, but does not have any possible format or imaginable combination.

As written in the manual,

Thus, the question is not why this program doesn't offer functionality X?, why it doesn't output information Y?, or why it doesn't present results in format Z?, but How should I use FreeLing library to write a program that does exactly what I need?.

So, if you need some output or processing not offered by the sample program, you need to write your own main program (or to modify "analyzer" to get it)

In your case, if all you need is breaking NEs, you can write a dummy python script (or perl, awk, or whatever you prefer) to do the job.