Hello!
I am using Freeling to parse sentences, to get its dependency tree, syntactic function of each node, and the lemma of each word.
I use version 3.1 of Freeling, from a java program (via your native interface compiled with Swig). The language I use is Spanish.
The problem I have is with regard to misspelled words (or specifically accented words written without accent). In my case, the word without accent also exists in the dictionary. The fact that the word is wrongly identified, makes that the word is assigned a wrong PoS, the dependency tree is also wrong, and syntactic / constituent is also wrong.
For example, in the case following, the words 'último' and 'cuánto' were written without accent ('ultimo' and 'cuanto'). Freeling identify 'ultimo' as a verb (instead of an adjective, which would correspond to the word with accent) and 'cuanto' as an adverb (rather than an interrogative pronoun). These words appear in the analysis with a single sense.
Is there any way to make Freeling analyze words, considering typing errors?
How can I configure Freeling, or do the analysis, so Freeling use 'último' instead of 'ultimo' to get the PoS, dependency tree, syntactic function, etc.
Example of erroneous words ('¿Cuanto he gastado el ultimo año?'):
grup-verb/top/(gastado gastar VMP00SM -) [
vaux/aux/(he haber VAIP1S0 -)
F-no-c/term/(¿ ¿ Fia -)
sadv/cc/(Cuanto cuanto RG -)
espec-ms/modnomatch/(el el DA0MS0 -)
grup-verb/modnomatch/(ultimo ultimar VMIP1S0 -) [
sn/cc/(año año NCMS000 -)
F-term/term/(? ? Fit -)
]
]
Example of correct words ('¿Cuánto he gastado el último año?'):
grup-verb/top/(gastado gastar VMP00SM -) [
vaux/aux/(he haber VAIP1S0 -)
F-no-c/term/(¿ ¿ Fia -)
sn/subj/(Cuánto cuánto PT0MS000 -)
sn/cc/(año año NCMS000 -) [
espec-ms/espec/(el el DA0MS0 -)
s-a-ms/adj-mod/(último último AO0MS0 -)
]
F-term/term/(? ? Fit -)
]
Thanks
It is possible, but not available out-of-the-box
"analyzer" is just an example program, and it does not offer access to all FreeLing possibilities. As it comes out of the box, "analyzer" is designed to process properly written text.
However, FreeLing contains some modules and tricks that can help you process mispelled texts, but you'll need to combine them yourself in your own main program (or alter analyzer to do it)
Some possibiilties:
Then the tagger will have the chance to select a different tag for that word.
Nevertheless, this will add a lot of ambiguity to the dictioanry, so the tagger performance (and thus, the parser's) will suffer.
Thanks
Thank you. I will try the third one, as it seems to be a robust solution.
I am in a java environment, and I have seen the Alternatives class in the freeling.jar, so I suppose I could do that.