incorrect language on Semantic Graph Frame lemma

Submitted by carlesg on Fri, 07/29/2016 - 14:37

Hello,

When I use the SemanticGraph to ask for the lemmas of the Frames on the Graph, I get the lemma in English, although I configure Freeling in Spanish, and I ask a question in Spanish.
If I try the Spanish sentence 'Dime el valor del coche.', the tagger says:

-------- TAGGER results -----------
Di decir VMM02S0
me me PP1CS00
el el DA0MS0
valor valor NCMS000
de de SP
el el DA0MS0
coche coche NCMS000
. . Fp

But the semantic Graph says (look at the lemma on Frame F1):
-------- SEMANTIC GRAPH results -----------
ENTITY W2 : me
ENTITY W3 : valor
FRAME F1 : speak.01|talk.01 : 1 : 1 : 00941990-v
ARG A2:Co-Agent : W2
ARG A1:Topic : W3

I'm using a Java class on Windows to invoke freeling via the JNI library.

I I use the online demo, it works right and it says the lemma is 'decir.00'.
Which is the problem? Is there any configuration problem?

By the way, what does the final number means on the frame lemma? (.00, .01...)

Those codes are not the lemma, but the semantic code for the verb meaning in propbank (http://propbank.github.io/)

E.g for the verb "bear", propbank has two senses, bear.01 and bear.02:
http://verbs.colorado.edu/propbank/framesets-english-aliases/bear.html

The idea is that you get a language-independent information on the verb frame. So if the sentence was in English instead of Spanish, the verb codes would be the same. Also if the verb was "tell" instead of "say".
In this way, you get a language independent semantic graph that can be usefult to compare texts that express the same meaning using different words, or even in different languages

If you get codes such as "decir.00" is because it could not disambiguate properly or the sense was not found in propbank (that is why you get ".00", which is not in propbank)

If you want to recover the lemma for a frame in the graph, you need to be aware that the results as presented by analyzer are very simple.
If you use a json or XML output you will have the full graph, with links among components (e.g. in the semantic graph frame you will have the semantic code, but if you want the lemma, the frame will contain the ID for the token that originated the frame, which will contain the lemma).
If you call the library yourself, you can navigate the document data structure to locate the token that originated the frame and find out its lemma.

Thank you.
So the next question is, why does the demo shows different semantic code for the same word from the one I get in a local execution? Why does the demo can not disambiguate the meaning, but local execution can do that, in the same sentence?

And the second question is,
How can I get the lemma from root word in the frame (in this case, 'Dime'->'decir') traversing the semantic graph? (the last option you say) How can I locate the original Word that originate that token? Which is the matching id?
Is there any way to map the objects in the Semantic graph (doc.getSemanticGraph().getEntities() and doc.getSemanticGraph().getFrames() (root) and doc.getSemanticGraph().getFrames().getArguments()) to the objects in the Trees or Lists (Word, etc.) ?

The demo is probably not running the same revision, but an older one, so there may be some differences.

For The sentence "Dime el valor del coche." you get the semantic graph:
<semantic_graph>
<entity id="W2" lemma="me">
<mention id="t1.2" words="me"/>
</entity>
<entity id="W3" lemma="valor" sense="05856388-n">
<mention id="t1.4" words="el valor de el coche"/>
<synonym lemma="valor"/>
<URI URI="http://wordnet-rdf.princeton.edu/wn30/05856388-n&quot; knowledgeBase="WordNet"/>
<URI URI="http://ontologyportal.org/SUMO.owl#Quantity&quot; knowledgeBase="SUMO"/>
</entity>
<frame id="F1" lemma="decir.00" sense="" token="t1.1">
<argument entity="W2" role="A2"/>
<argument entity="W3" role="A1"/>
</frame>
</semantic_graph>

You can see that the frame "F1" corresponds to token "t1.1"

Then, you can navigate the XML tree looking for a "<token>" with id="t1.1" and you will get
<token ctag="VMM" form="Di" id="t1.1" lemma="decir" mood="imperative" num="singular" person="2" pos="verb" tag="VMM02S0" type="main">

where you can extract that "lemma" is "decir"

If you are not using XML, but directly calling the library, you need to go to token 1 in sentence 1 of the document (that is what t1.1 means)