How to analyze an already tokenized file

Submitted by andres on Tue, 11/07/2017 - 20:40
Forums

I have already tried:

analyze --inplv token -f ca.cfg < orig.txt > target.txt
analyze --inplv splitted -f ca.cfg < orig.txt > target.txt

And also changing at the ca.cfg file the original "InputLevel=text" to "InputLevel=token"
But it alwasys says:
Error - 'text' input format only accepts input analysis level 'text'.

Command line options overwrite configuration file options. So, if you put them in the command line, you don't need to change the ca.cfg file

There are two options controlling the kind of input:  --inplv controls to which extend is the input analyzed (tokenized, splitted, tagged, etc).  The option --input controls which is the input format (plain text, freeling legacy format, conll, xml, etc)

The default --input is "text".  Since your input is no longer plain text, but it has gone through some analysis stage (even if it is only a tokenizer), you should add the option "--input freeling" (legacy freeling format is the only one supporting tokenized level so far)

 

I get this:

analyze --input freeling -f ca.cfg < orig.txt > target.txt
Error - 'freeling' input format only accepts input analysis levels 'splitted', 'morfo', 'tagged', and 'senses'.

No improvement though...

inplv before input:
analyze --inplv token --input freeling -f ca.cfg < orig.txt > target.txt
Error - 'freeling' input format only accepts input analysis levels 'splitted', 'morfo', 'tagged', and 'senses'.

input before inplv:
analyze --input freeling --inplv token -f ca.cfg < orig.txt > target.txt
Error - 'freeling' input format only accepts input analysis levels 'splitted', 'morfo', 'tagged', and 'senses'.

 Uhm I see... That means you'll need to feed it with splitted text.  The format is the same than for "token" (i.e. one token per line) but with an additional blank line after the end of each sentence.

So, if you just add a blank line after each sentence end (includng the last one), it will work. (you'll have to use "--inplv splitted" in the command)

 

 

Situation remains the same, now my file is like this:
el

programa

compressor

Pkzip

But I still get this:

analyze --input freeling --inplv token -f ca.cfg < orig_bl.txt > target.txt
Error - 'freeling' input format only accepts input analysis levels 'splitted', 'morfo', 'tagged', and 'senses'.

Sorry, I didn't do a thorough read of your comment, it seems to be working now, and I think this also solved the problem I explained at the other post "Error on analysis", I had to change the tags like this though: <contrac forma="al"> to Acontracformaigcialcdz, so Freeling analyzes them like this: Acontracformaigcialcdz acontracformaigcialcdz NP00000 1

The input must be PLAIN TEXT. That means NO XML marks.
For instance, a correct input (for option --inplv splitted --input freeling) would be;

El
gat
menja
peix
.

El
gos
borda
.

So, you need to remove all meta information (such as "<contrac" or "form=") and leave ONLY THE TEXT. one word per line, one blank line after each sentence. (I already mentioned this several times... Please carefully read my posts and follow them before posting again)