Lemas file

Submitted by Anonymous (not verified) on Mon, 11/16/2020 - 20:21

Forums

Dear all,

Do you know how to get, from a dicc.dat/dicc.txt file, a file with all the lemas and all the tokens forms, in oder to load it in a concordance program such as AntConc?

casa -> casa, cases
menjar -> menjo, menjava, menjaré, mengin...

Could yo recommend me some kind of instruction, such as awk...?

Actually, the proper format…

Actually, the proper format should be:

casa TAB -> TAB casa TAB cases
menjar TAB -> TAB menjo TAB menjava TAB menjaré TAB mengin

(etcetera...)

Once you install FreeLing, a file is created in /usr/local/share/freeling/ca/dicc.src, which contains something very close to what you ask, and that should be easy to adapt with a simple awk command, a small python program, or even loading the file in a spreadsheet

Yes, an AWK command would be…

Yes, an AWK command would be great. But I have no idea of which would be the one.

AWK is an option for what…

AWK is an option for what toy want to do, not a requirement.

If you don't know AWK, you can use a small python script. Or Perl. Or any other programming language

If you are not a programmer, you probably can achieve the same loading the data in a spreadsheet like excel or openoffice.

Actually, the proper format…

Once you install FreeLing, a…

Yes, an AWK command would be…

AWK is an option for what…