Using List of sentences provided by Dependecies to Output_ConLL

Forums:

Hello, i'm searching for some help of getting a ConLL format for my dependecies output.
I'm running a pipeline with an input data that passes through Morph → Tagger → Chunker → Dependencies, on dependencies i got the output with the particular format of dependencies, however i need to pass it to ConLL format, so i've create a method that, with the given list of sentences (ls) provided by dependencies and printed there, used the output_conll.cc to create an instance to use output_conll::PrintResults (wostream &sout, const list &ls) function, but it's not working, i want to know if i can use the same 'ls' present on dependencies and call after PrintResults (ls); the conll->PrintResults(ls); or should i pass all the output dependencies file to output_conll::PrintResults(wostream &sout, const document &doc) ? or am i using it wrongly, and it need a certain type of format to call that method?

Thanks.

If you are using "analyzer", you just need to set option "--output conll"

If you are using your own main program to call FreeLing, you don't need to print them in one format and then convert it. You can print them directly in conll using the output_conll class.

The basic working of output_conll is very simple:
// create a output_conll handler
freeling::io::output_conll myout;
 
// assuming that "ls" contains a list of senteces that have
// been properly processed and parsed, you can print them
// in conll format simply calling PrintResults:
myout.PrintResults(wcout, ls)
 
// if you want to print results to a file instead of stdout,
// you can replace "wcout" with any other wostream
// (e.g. an open file)

That will print the sentences in a default format.
You can choose which columns you want to get and in which order they should appear providing a configuration file to the output_conll constructor.
Details on the config file are provided in the user manual:
https://talp-upc.gitbooks.io/freeling-user-manual/content/modules/io.html

First of all, i want to thank you for the help, and fastest response, it really helped me.
I'm working with fl files (fl1,..,fl4_dependencies) that is my pipeline.
The corpus is already trained to my language, and the input file is formated to UTF8, so i've just feed the next fl# files with the previous output, using the pipeline described on my first post.
However the input file to fl4_dependences is the output of fl1_tagger.

I edited the fl4_Dependences to output ls to conll format on while (std::getline (std::wcin, text)) and after that while, to process last sentence in buffer if there's any.

I think i got the correct output, however i got some modnorule, like:

1 ... VMIF3S0 - - - - (grup-verb:2(verb:2 2 modnorule - -
2 ... VMN0000 - - - - (infinitiu:2(inf:2))) 0 top - -
... modnorule --

is it normal, it that because of the format of sentences list?
Many Thanks for the help!

The default behaviour of output_conll includes a column with chunking output (if you used det_txala parser). That is the column with "grup-verb" etc.

The two last columns are the dependency parsing (head and function respectively).
"modnorule" means that the parser didn't have any rule to label that dependency (dep_txala is a rule-based parser). The reason may be a strange or ungrammatical sentence.

In fact, if you use "analyze" command with the same input, you'll see how the output includes "modnorule" too.

You can get more robust results if you use dep_treeler parser instead of dep_txala.

Hello lluisp, thank you once again.
I think im using the del_treeler already, because i call it when i run fl4_dependences.cc with the output of fl1_tagger:

./fl4_dependences /usr/local/share/freeling/pt/chunker/grammar-chunk.dat /usr/local/share/freeling/pt/dep_treeler/dependences.dat TryOutCONLL

Im not using the analyse command at all. the question is that got many many "modnorule", the ones that isn't "modnorule" is "top".

Check one simpel output that o got: ( with dependences and ConLL Format )


espec-ms/top/(este este DD0MS0) [
grup-verb/modnorule/(é ser VMIP3S0)
sn/modnorule/(ficheiro ficheiro NCMS000) [
espec-ms/modnorule/(um um DI0MS0)
]
sp-de/modnorule/(de de SP) [
sn/modnorule/(teste teste NCMS000)
]
sadv/modnorule/(aqui aqui RG)
grup-verb/modnorule/(temos ter VMIP1P0)
sn/modnorule/(texto texto NCMS000) [
espec-ms/modnorule/(um um DI0MS0)
]
prel/modnorule/(que que PR0CN00)
grup-verb/modnorule/(servir servir VMN0000) [
VMIP3S0/modnorule/(vai ir VMIP3S0)
]
sp-de/modnorule/(de de SP) [
sn/modnorule/(teste teste NCMS000)
]
grup-sp/modnorule/(para para SP) [
sn/modnorule/(programa programa NCMS000) [
espec-ms/modnorule/(este este DD0MS0)
n-fs/modnorule/(. . NCFS000)
]
]
]

1 este este DD0MS0 - - - - (espec-ms:1(dem-ms:1) 0 top - -
2 é ser VMIP3S0 - - - - (grup-verb:2(verb:2)) 1 modnorule - -
3 um um DI0MS0 - - - - (sn:4(espec-ms:3(indef-ms:3)) 4 modnorule - -
4 ficheiro ficheiro NCMS000 - - - - (grup-nom-ms:4(n-ms:4))) 1 modnorule - -
5 de de SP - - - - (sp-de:5 1 modnorule - -
6 teste teste NCMS000 - - - - (sn:6(grup-nom-ms:6(n-ms:6)))) 5 modnorule - -
7 aqui aqui RG - - - - (sadv:7) 1 modnorule - -
8 temos ter VMIP1P0 - - - - (grup-verb:8(verb:8)) 1 modnorule - -
9 um um DI0MS0 - - - - (sn:10(espec-ms:9(indef-ms:9)) 10 modnorule - -
10 texto texto NCMS000 - - - - (grup-nom-ms:10(n-ms:10))) 1 modnorule - -
11 que que PR0CN00 - - - - (prel:11) 1 modnorule - -
12 vai ir VMIP3S0 - - - - (grup-verb:13(verb:13 13 modnorule - -
13 servir servir VMN0000 - - - - (infinitiu:13(inf:13)))) 1 modnorule - -
14 de de SP - - - - (sp-de:14 1 modnorule - -
15 teste teste NCMS000 - - - - (sn:15(grup-nom-ms:15(n-ms:15)))) 14 modnorule - -
16 para para SP - - - - (grup-sp:16(prep:16) 1 modnorule - -
17 este este DD0MS0 - - - - (sn:18(espec-ms:17(dem-ms:17)) 18 modnorule - -
18 programa programa NCMS000 - - - - (grup-nom-ms:18(n-ms:18) 16 modnorule - -
19 . . NCFS000 - - - - (n-fs:19))))) 18 modnorule - -

Thank you once again for the help! it means a lot to me!

There are two dependency parsers in FreeLing:

  • dep_txala : rule based parser. Requires the use of the chunker (class chart_parser) the same way that fl4_dependences.cc does it
  • dep_treeler: machine learning parser. It does not require the use of chunker nor dep_txala.

There is no dep_txala grammar for portuguese, so you must use dep_treeler for this language.
This means that you have to change the code in fl4_dependences to create an instance of dep_treeler class, instead of dep_txala.
Since dep_treeler does not require the chunker, you don't need to instantiate nor call the chart_parser class.

I told you about "analyzer", because is the default program for freeling and the easiest way to find out which results it produces on a given scenario.
You can use it to check whether your results are what should be.
If your program gets the same results than "analyzer", then you are doing well. If you get different results, your program has a bug.

For instance, the command:
echo "este é um ficheiro de teste." | analyze -f pt.cfg --outlv dep
produces
root/(ficheiro ficheiro NCMS000 -) [
nsubj/(este este PD0MS00 -)
cop/(é ser VMIP3S0 -)
nsubj/(um um DI0MS0 -)
nmod/(teste teste NCMS000 -) [
case/(de de SP -)
]
nmod/(. . Fp -)
]

And the command
echo "este é um ficheiro de teste." | analyze -f pt.cfg --outlv dep --output conll
produces
1 este este PD0MS00 - - - - - 4 nsubj - -
2 é ser VMIP3S0 - - - - - 4 cop - -
3 um um DI0MS0 - - - - - 4 nsubj - -
4 ficheiro ficheiro NCMS000 - - - - - 0 root - -
5 de de SP - - - - - 6 case - -
6 teste teste NCMS000 - - - - - 4 nmod - -
7 . . Fp - - - - - 4 nmod - -

Finally, note that the programs fl1_, fl2_, etc do the same work than "analyzer". They are only simple examples to illustrate the use of the modules individually.

Unless you have very speficic needs that require that you build your own program, you might be better off simply using "analyzer". It has lots of options that will allow you to customize which modules are used, with which configuration files, which output format is produced, etc.

Hello lluisp, thank you for the reply, in fact the analyzer gives a good output. Now i'm trying to do what you told to do, create a new instance on dep_treeler, and delete the char parser.
But i maybe doing something wrong, because i got an empty output.
what i've done was comment all the chart parser instances, and add new dep and argv[1].


dependency_parser *dep = NULL;
//chart_parser *parser = NULL;
...
//parser = new chart_parser(util::string2wstring(argv[1]));
//dep = new dep_txala (util::string2wstring(argv[2]), parser->get_start_symbol ());

dep = new dep_treeler (util::string2wstring(argv[1]));
...
ls.push_back (av);
//parser->analyze (ls);
dep->analyze (ls);
PrintResults (ls);
...

Analyzer really gives a great output, but if i could do it on the dependences would be great.

Once again, thank you for the help!
Bruno.

Your program looks good.

If it is not printing what you want, is probably because the function PrintResults is printing the tree in the parenthesized freeling format.
You just need to replace the call to PrintResults with a call to output_conll::PrintResults as I described in my first reply.

If you are having a different problem, please be more specific. Post examples of which command you run, which is the input sentence, which output you get, and which output you expected.