Can't reproduce example outcome

Submitted by GregReese on Thu, 05/09/2019 - 15:03

I installed the Windows version of FreeLing 4.1. When I try the usage example shown at
(with mytext.txt and myconfig.cfg also on that page), I get an output that is quite different than the expected results that the example shows. My output is:

El el DA0MS0 1
gato gato NCMS000 1
come comer VMIP3S0 0.978902 comer VMM02S0 0.021098
pescado pescado NCMS000 0.822581 pescar VMP00SM 0.177419
. . Fp 1

Pero pero CC 0.999902 pero NCMS000 9.84058e-05
a a SP 0.998775 a NCFS000 0.0012246
Don_Jaime don_jaime NP00000 1
no no RN 0.999297 no NCMS000 0.00070347
le le PP3CSD0 1
gustan gustar VMIP3P0 1
los el DA0MP0 0.992728 lo PP3MPA0 0.0072574 lo NCMP000 1.44858e-05
gatos gato NCMP000 1
. . Fp 1

Any suggestions on why there's a difference?

Also, it took three minutes and 18 seconds to process the input file, which has only thirteen words. Does this sound like a normal time?


Greg Reese

The difference is only in the probabilities.  The predicted PoS tags are basically the same.

This is because the example was run on an older version of FreeLing, and the probabilities have been retrained since then. If you use "--output tagged" the resulting PoS tags should be the same.

Regarding the time:

The "analyzer" program is just a demo. When you launch it, it loads all the modules specified in the config file (even if you are not going to use them), and that takes some time (although 3 minutes looks like a lot, maybe you are on an old machine?).

If you want it to load faster, you can remove from config file lines for modules you don't want to use (e.g. WSD, parser, coref, etc)

Once the modules are loaded, analysis is pretty fast. Try the same command, but do not redirect the input from mytext.txt.  Instead, write sentences in the terminal. You'll see how the first sentence takes a lot (it is loading) but following sentences are instantaneous

The "analyzer" program can be useful to process large files (load once, process everything).
But to analyze small bits of text at a time, you should either use the "--server" mode for "analyze" or develop your own main program (see the tutorial) that loads only what you need, and reuses it as much as possible.