Version of sense dictionary for Spanish?

Submitted by ulrike.henny on Tue, 12/14/2021 - 15:41

Hello,
I have been using FreeLing for some time now to annotate literary texts in Spanish. Thank you very much for this great and very useful tool!
Today I have a question about the version of the sense dictionary for Spanish which is used in FreeLing. In the documentation I found the information that the sense dictionary from the MRC 3.0 project is used. However, I noticed that some words that I would have expected to be annotated with sense information are not.
For example, the lemma "callejuela" is contained in MRC 3.0 as it can be found through the web interface at https://adimen.si.ehu.es/cgi-bin/wei/public/wei.consult.perl?item=calle…. However, this word is not annotated with sense information when I use the analyzer of FreeLing. An example for a sentence where it occurs is "Ibamos a salir de una callejuela formada con sacos de harina y cajas de fideos."
I used the following command to call the analyzer: analyze -f es.cfg --outlv tagged --sense ukb --nec --output xml < nh0010.txt > nh0010.xml
My version of FreeLing is 4.0 and I use it on Ubuntu 20.04.3 LTS.
I was wondering whether the latest version of the MRC is used in FreeLing or maybe an older version with fewer senses. Of course I consider to upgrade to the latest version of FreeLing if that would make a difference but did not do so yet because I could not find any comments on this issue in the news about the latest versions.
Best regards,
Ulrike

4.0 is a bit outdated...  It is very likely that it includes an older version of MCR that does not contain "callejuela".

An easy way to fix it is just to replace the file senses30.src with the one from a newer FreeLing version

Also, upgrading to a more recent version will solve this and many other issues you may have.

ulrike.henny

Sat, 12/18/2021 - 05:45

Thank you very much! I removed version 4.0 and installed 4.2 and now indeed more senses are annotated than before, including "callejuela". It is also good to know where the file with the senses is.

Just for information, now I get some warnings of the following type:

Unknown synset 80000054-a ignored. Please check consistency between sense dictionary and KB

But I already found your discussion on this issue (https://githubmemory.com/repo/TALP-UPC/FreeLing/issues/109).

Thank you for your quick response and help!

That synset corresponds to "human" when used as an adjective, which it seems to have been added to esWN, but not to other languages:

$ grep 80000054 data/es/senses30.src
80000054-a humano

The graph structure is common to all languages, and does not include any relation for that sysnset, thus the WSD algorithm complains about that.

$ grep 80000054 data/common/xwn.dat
01835496-v 80000054-v
08524735-n 80000054-n