Verb at the beginning of a sentence in spanish

Submitted by samuelhrg on Tue, 01/10/2017 - 18:23

I found a small problem with the PoS tagging. In the sentence "Esperé a que volvieras" Freeling tags "Esperé" as a noun. If instead I use "Yo esperé a que volvieras" then "Yo" is tagged as the noun and "esperé" as the verb.
I can guess where the problem comes from. In spanish you don't need to specify the noun at the beginning of the sentence because the verb gives you this information. "Esperé" means "I waited", there's no need to write "Yo esperé".
At first I thought it wasn't a big deal but I can see it's going to be a big problem because it's pretty common to ommit the noun at the beginning of a sentence in spanish.

It works for me:

$ echo "Yo esperé a que volvieras." | analyze -f es.cfg
Yo yo PP1CSN0 1
esperé esperar VMIS1S0 1
a a SP 0.998775
que que CS 0.449861
volvieras volver VMSI2S0 1
. . Fp 1


$ echo "Esperé a que volvieras." | analyze -f es.cfg
Esperé esperar VMIS1S0 1
a a SP 0.998775
que que CS 0.449861
volvieras volver VMSI2S0 1
. . Fp 1

Neither "esperé" nor "yo" are nouns in the dictionary, so you are mixing something up (e.g. using English configuration to analyzer Spanish text)
Maybe if you provide some more details, we can spot what are you doing wrong.
Which freeling version are you using? Which command are you using?

samuelhrg

Mon, 01/16/2017 - 16:55

Sorry I took so long to reply.
I'm using Freeling 4.0 on Windows. I did some testing:

Yo esperé a que volvieras.
Esperé a que volvieras.

D:\Dropbox\CENIDET\Ruby>analyzer.bat -f "C:\Freeling-4.0\data\config\es.cfg" < texto.txt
  Fz 1
Yo yo NP00000 1
esperé esperar VMIS1S0 1
a a SP 0.998775
que que CS 0.449861
volvieras volver VMSI2S0 1
. . Fp 1

Esperé esperar VMIS1S0 1
a a SP 0.998775
que que CS 0.449861
volvieras volver VMSI2S0 1
. . Fp 1

That works fine, except that it marks "Yo" as a Proper Noun instead of a Pronoun.
Then I noticed that is the full text what's giving me trouble.

Esperé a que volvieras, Dra. Azucena. Durante días, semanas, años, quizá toda la vida.

D:\Dropbox\CENIDET\Ruby>analyzer.bat -f "C:\Freeling-4.0\data\config\es.cfg" < poema.txt
  Fz 1
Esperé esperé NP00000 1
a a SP 0.998775
que que CS 0.449861
volvieras volver VMSI2S0 1
, , Fc 1
Dra._Azucena dra._azucena NP00000 1
. . Fp 1

Durante durante SP 1
días día NCMP000 1
, , Fc 1
semanas semana NCFP000 1
, , Fc 1
años año NCMP000 1
, , Fc 1
quizá quizá RG 1
toda todo DI0FS0 0.984467
la el DA0FS0 0.98926
vida vida NCFS000 1
. . Fp 1

Now I'm guessing it has something to do with the context, but I'm not sure what's happening.

In the output of your command:

D:\Dropbox\CENIDET\Ruby>analyzer.bat -f "C:\Freeling-4.0\data\config\es.cfg" < texto.txt
  Fz 1
Yo yo NP00000 1
esperé esperar VMIS1S0 1
a a SP 0.998775
que que CS 0.449861
volvieras volver VMSI2S0 1
. . Fp 1

you can see that there is a line " Fz 1" before "Yo".

This line is a token that FreeLing created because there was something at the beggining of your file (maybe a BOM marker or the like... windows editors tend to add this kind of garbage to plain text files)
This character is recognized as "Fz" which means "unknown punctuation sign"
Then, comes "Yo", which is a capitalized word after a non-sentence ending punctuation (Fz is not in the list of sentence-ending punctuations), thus it is interpreted as proper noun.

Make sure that your file does not have binary codes at the beggining, and you should be ok.