Forums
Dear developer,
Will you check these two results? It looks like we can obtain better results with "morfo" output format.
http://koichi.nihon.to/psnl/tmp/fl1.png
http://koichi.nihon.to/psnl/tmp/fl2.png
Here is the test sentence:
Marilla lighted a candle and told Anne to follow her, which Anne spiritlessly did, taking her hat and carpet-bag from the hall table as she passed.
When I don't want to use NER, is it better to use "morfo" than "tagged" output format?
Thank you!
If you deactivate NER,…
If you deactivate NER, proper nouns are not recognized, so you will not get a proper analysis for "Anne".
In any case, "morfo" is not better than "tagged". "morfo" highest output is equivalent to a unigram tagger, thus, context independent. That is, the word "walk" would get always the tag "VB", regardless of whether the sentence is "I walk every day" or "I took a long walk".
On the other hand, "tagged" output will vary depending on the sentence.
So, to summarize: If your text has proper nouns, you should not deactivate NER if you expect it to work.
Can I disable multiwords detection of NER?
Thank you for your reply!
Then, I would use NER. That's OK.
But can I disable multiwords detection of NER? Because NER extracts too long words for me. For example, it extracts "anne_of_green_gables" as a word from a sentence below.
I would like to extract and count "Anne" as a word, not "anne_of_green_gables". Will you give me any possible solutions?
Thanks!
You can tune the NER…
You can tune the NER behaviour in the np.dat configuration file
For instance, you could remove "of" from the list of function words that are allowed into a named entity. Then, you would get "Anne" and "Green_Gables". (Though for a text such as "Bank of External Commerce" you would get "Bank" and "External_Commerce"... so... you need to weight pros and cons)
See user manual to find out the details and available options
Understood. Thank you very…
Understood. Thank you very much!