Hi, I have been using freeling for a few months now to extract triplets. So far I have succeded in doing so by using the dependency tree and the full parse tree, but I am trying to improve my approach by using coreference, semantic graph and NERC.
My work so far
I checked the tutorial for python, but I couldn't find anything beyond depdency parsing. So I went through the class list (since the same classes should be available for python and c++) and I decided to use relaxcor, but the analyze method of this class only accepts a document as a parameter.
So what I'm asking if anyone can help me with is the following:
1. Document creation problem: How do you create a document using the python API? All examples use a tokenizer then a splitter which returns a list of sentences. I checked some c++ examples provided in the simple_examples folder (https://github.com/TALP-UPC/FreeLing/blob/master/src/main/simple_exampl…) and I tried using the document method "insert" just like the example but after several different attempts I couldn't find a way to provide the right parameters, aparently it expects:
std::list< freeling::paragraph >::insert(
std::list< freeling::paragraph >::iterator,
std::list< freeling::paragraph >::value_type const &)
So i tried the method append. Which didn't throw any errors but I haven't fully checked if this is correct, since the only thing i checked is the number of words and it didn't seem correct, my text was a dummy test of the likes "Sobre la mesa María ve y coge una manzana, un sombrero, una llave y dos paraguas rojo." and after the append method the document had 80 words.
Here is my code so far for this:
# tokenize input line into a list of words
lw = tk.tokenize(text)
# split list of words in sentences, return list of sentences
ls = sp.split(lw)
paragraphs = pyfreeling.paragraph(ls)
# list_paragraphs = pyfreeling.ListParagraph([paragraphs])
doc = pyfreeling.document()
2. Doubt about entities: Going back to the example "Sobre la mesa María ve y coge una manzana, un sombrero, una llave y dos paraguas rojo." I realized that working with capitalized words and lowercase produce different results, but by making it all lowercase the entity recognition stops recognizing "maría" as a person. Is there are workaround for this or am I going in the wrong direction? The main problem is that "maría" not recognized as a named entity (which i need it to be by the way) results in "maría" not being the subject of the sentence anymore. Here is how im getting this:
neclass = pyfreeling.ner(lpath + "/nerc/ner/ner-ab-rich.dat")
ls = morfo.analyze_sentence_list(ls)
ls = tagger.analyze_sentence_list(ls)
ls = sen.analyze_sentence_list(ls)
ls = neclass.analyze_sentence_list(ls)
ls = wsd.analyze_sentence_list(ls)
ls = srl_parser.analyze_sentence_list(ls)
ls = dep.analyze_sentence_list(ls)
ls = parser.analyze_sentence_list(ls)
3. How to retrieve named entities: Kind of a follow up of the previous question, how do I get the named entities? I couldn't find any code related to this.