Lemas file

Submitted by Sebastià Salvà… on Mon, 11/16/2020 - 20:21

Dear all,

Do you know how to get, from a dicc.dat/dicc.txt file, a file with all the lemas and all the tokens forms, in oder to load it in a concordance program such as AntConc?

casa -> casa, cases
menjar -> menjo, menjava, menjaré, mengin...

Could yo recommend me some kind of instruction, such as awk...?

Sebastià Salvà…

Mon, 11/16/2020 - 20:30

Actually, the proper format should be:

casa TAB -> TAB casa TAB cases
menjar TAB -> TAB menjo TAB menjava TAB menjaré TAB mengin

(etcetera...)

Once you install FreeLing, a file is created in /usr/local/share/freeling/ca/dicc.src, which contains something very close to what you ask, and that should be easy to adapt with a simple awk command, a small python program, or even loading the file in a spreadsheet

AWK is an option for what toy want to do, not a requirement.

If you don't know AWK, you can use a small python script.  Or Perl.  Or any other programming language

If you are not a programmer, you probably can achieve the same loading the data in a spreadsheet like excel or openoffice.