Linguistic Data https://nlp.lsi.upc.edu/freeling/ en Version of sense dictionary for Spanish? https://nlp.lsi.upc.edu/freeling/node/712 <span property="schema:name" class="field field--name-title field--type-string field--label-hidden">Version of sense dictionary for Spanish?</span> <span rel="schema:author" class="field field--name-uid field--type-entity-reference field--label-hidden"><span lang="" about="/freeling/user/7215" typeof="schema:Person" property="schema:name" datatype="">ulrike.henny</span></span> <span property="schema:dateCreated" content="2021-12-14T14:41:59+00:00" class="field field--name-created field--type-created field--label-hidden">Tue, 12/14/2021 - 15:41</span> <div class="field field--name-taxonomy-forums field--type-entity-reference field--label-above"> <div class="field__label">Forums</div> <div class="field__item"><a href="/freeling/taxonomy/term/4" hreflang="en">Linguistic Data</a></div> </div> <div property="schema:text" class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><p>Hello,<br /> I have been using FreeLing for some time now to annotate literary texts in Spanish. Thank you very much for this great and very useful tool!<br /> Today I have a question about the version of the sense dictionary for Spanish which is used in FreeLing. In the documentation I found the information that the sense dictionary from the MRC 3.0 project is used. However, I noticed that some words that I would have expected to be annotated with sense information are not.<br /> For example, the lemma "callejuela" is contained in MRC 3.0 as it can be found through the web interface at <a href="https://adimen.si.ehu.es/cgi-bin/wei/public/wei.consult.perl?item=callejuela&amp;button1=Look_up&amp;metode=Word&amp;pos=Nouns&amp;llengua=Spanish_3.0&amp;search=near_synonym&amp;estructura=English_3.0&amp;glos=Gloss&amp;levin=1&amp;spa-30=Spanish_3.0">https://adimen.si.ehu.es/cgi-bin/wei/public/wei.consult.perl?item=calle…</a>. However, this word is not annotated with sense information when I use the analyzer of FreeLing. An example for a sentence where it occurs is "Ibamos a salir de una callejuela formada con sacos de harina y cajas de fideos."<br /> I used the following command to call the analyzer: analyze -f es.cfg --outlv tagged --sense ukb --nec --output xml &lt; nh0010.txt &gt; nh0010.xml<br /> My version of FreeLing is 4.0 and I use it on Ubuntu 20.04.3 LTS.<br /> I was wondering whether the latest version of the MRC is used in FreeLing or maybe an older version with fewer senses. Of course I consider to upgrade to the latest version of FreeLing if that would make a difference but did not do so yet because I could not find any comments on this issue in the news about the latest versions.<br /> Best regards,<br /> Ulrike</p> </div> <section class="field field--name-comment-forum field--type-comment field--label-hidden comment-wrapper"> <article role="article" data-comment-user-id="63" id="comment-794" class="comment js-comment clearfix"> <span class="hidden" data-comment-timestamp="1639643328"></span> <footer class="comment__meta"> <article typeof="schema:Person" about="/freeling/user/63" class="profile"> </article> <p class="comment__author"><span lang="" about="/freeling/user/63" typeof="schema:Person" property="schema:name" datatype="">lluisp</span></p> <p class="comment__time">Thu, 12/16/2021 - 09:28</p> <p class="comment__permalink"><a href="/freeling/comment/794#comment-794" hreflang="en">Permalink</a></p> </footer> <div class="comment__content"> <h3><a href="/freeling/comment/794#comment-794" class="permalink" rel="bookmark" hreflang="en">4.0 is a bit outdated...  It…</a></h3> <div class="clearfix text-formatted field field--name-comment-body field--type-text-long field--label-hidden field__item"><p>4.0 is a bit outdated...  It is very likely that it includes an older version of MCR that does not contain "callejuela".</p> <p>An easy way to fix it is just to replace the file senses30.src with the one from a newer FreeLing version</p> <p>Also, upgrading to a more recent version will solve this and many other issues you may have.</p></div> <drupal-render-placeholder callback="comment.lazy_builders:renderLinks" arguments="0=794&amp;1=default&amp;2=en&amp;3=" token="95k6nwWXsBZ2B1g4R5rUF-f6jJYViKtvqnE-sFGPdec"></drupal-render-placeholder> </div> </article> <article role="article" data-comment-user-id="7215" id="comment-799" class="comment js-comment by-node-author clearfix"> <span class="hidden" data-comment-timestamp="1639802706"></span> <footer class="comment__meta"> <article typeof="schema:Person" about="/freeling/index.php/user/7215" class="profile"> </article> <p class="comment__author"><span lang="" about="/freeling/user/7215" typeof="schema:Person" property="schema:name" datatype="">ulrike.henny</span></p> <p class="comment__time">Sat, 12/18/2021 - 05:45</p> <p class="comment__permalink"><a href="/freeling/comment/799#comment-799" hreflang="en">Permalink</a></p> </footer> <div class="comment__content"> <h3><a href="/freeling/comment/799#comment-799" class="permalink" rel="bookmark" hreflang="en">Thank you very much! I…</a></h3> <div class="clearfix text-formatted field field--name-comment-body field--type-text-long field--label-hidden field__item"><p>Thank you very much! I removed version 4.0 and installed 4.2 and now indeed more senses are annotated than before, including "callejuela". It is also good to know where the file with the senses is.</p> <p>Just for information, now I get some warnings of the following type: </p> <p>Unknown synset 80000054-a ignored. Please check consistency between sense dictionary and KB</p> <p>But I already found your discussion on this issue (<a href="https://githubmemory.com/repo/TALP-UPC/FreeLing/issues/109">https://githubmemory.com/repo/TALP-UPC/FreeLing/issues/109</a>).</p> <p>Thank you for your quick response and help!</p> </div> <drupal-render-placeholder callback="comment.lazy_builders:renderLinks" arguments="0=799&amp;1=default&amp;2=en&amp;3=" token="aGMZHpXsn6-Wvj9zGwd7njwt44jqKGsy2xJDZOP41CI"></drupal-render-placeholder> </div> </article> <div class="indented"><article role="article" data-comment-user-id="63" id="comment-802" class="comment js-comment clearfix"> <span class="hidden" data-comment-timestamp="1639988165"></span> <footer class="comment__meta"> <article typeof="schema:Person" about="/freeling/user/63" class="profile"> </article> <p class="comment__author"><span lang="" about="/freeling/user/63" typeof="schema:Person" property="schema:name" datatype="">lluisp</span></p> <p class="comment__time">Mon, 12/20/2021 - 09:16</p> <p class="comment__permalink"><a href="/freeling/comment/802#comment-802" hreflang="en">Permalink</a></p> <p class="visually-hidden">In reply to <a href="/freeling/comment/799#comment-799" class="permalink" rel="bookmark" hreflang="en">Thank you very much! I…</a> by <span lang="" about="/freeling/user/7215" typeof="schema:Person" property="schema:name" datatype="">ulrike.henny</span></p> </footer> <div class="comment__content"> <h3><a href="/freeling/comment/802#comment-802" class="permalink" rel="bookmark" hreflang="en">That synset corresponds to …</a></h3> <div class="clearfix text-formatted field field--name-comment-body field--type-text-long field--label-hidden field__item"><p>That synset corresponds to "human" when used as an adjective, which it seems to have been added to esWN, but not to other languages:</p> <p>$ grep 80000054 data/es/senses30.src<br /> 80000054-a humano</p> <p>The graph structure is common to all languages, and does not include any relation for that sysnset, thus the WSD algorithm complains about that.</p> <p>$ grep 80000054 data/common/xwn.dat<br /> 01835496-v 80000054-v<br /> 08524735-n 80000054-n</p></div> <drupal-render-placeholder callback="comment.lazy_builders:renderLinks" arguments="0=802&amp;1=default&amp;2=en&amp;3=" token="fE6sxNe93HepnenJxJ7FEkPq6KkKCWoCfAweI4-t7fk"></drupal-render-placeholder> </div> </article> </div> </section> Tue, 14 Dec 2021 14:41:59 +0000 ulrike.henny 712 at https://nlp.lsi.upc.edu/freeling https://nlp.lsi.upc.edu/freeling/node/712#comments Verbs suddenly missing in Freeling dictionary (PT)? https://nlp.lsi.upc.edu/freeling/node/699 <span property="schema:name" class="field field--name-title field--type-string field--label-hidden">Verbs suddenly missing in Freeling dictionary (PT)?</span> <span rel="schema:author" class="field field--name-uid field--type-entity-reference field--label-hidden"><span lang="" about="/freeling/user/7679" typeof="schema:Person" property="schema:name" datatype="">pbholmen</span></span> <span property="schema:dateCreated" content="2020-12-09T18:39:18+00:00" class="field field--name-created field--type-created field--label-hidden">Wed, 12/09/2020 - 19:39</span> <div class="field field--name-taxonomy-forums field--type-entity-reference field--label-above"> <div class="field__label">Forums</div> <div class="field__item"><a href="/freeling/taxonomy/term/4" hreflang="en">Linguistic Data</a></div> </div> <div property="schema:text" class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><p>Hi</p> <p>I'm using the Portuguese Freeling dictionary data file for a tool for personal use. I just upgraded Freeling using Homebrew on macOS, the current version is stated as 4.2_1</p> <p>A very common verb, "resolver" is suddenly missing. I looked in the GitHub repo, and see that it is also missing in <code>FreeLing/data/pt/dictionary/entries/verbs</code>.</p> <p>I looked at the History for the file and found three commits, one named "Initial load version 4.0 (beta1)", in this version the verb is there. The second commit is named "new pt data", also here it is included, and the third commit, made on May 4 2016, named "improves in dict solving" does not include the verb.</p> <p>I understand that it's not meant to be a data source for arbitrary tools, so I wonder is this a regression or meant to be that way? Is the verb handled some other place in the Freeling system? Or have I accidentally installed a development version and not an official release?</p> </div> <section class="field field--name-comment-forum field--type-comment field--label-hidden comment-wrapper"> <article role="article" data-comment-user-id="63" id="comment-730" class="comment js-comment clearfix"> <span class="hidden" data-comment-timestamp="1607610033"></span> <footer class="comment__meta"> <article typeof="schema:Person" about="/freeling/user/63" class="profile"> </article> <p class="comment__author"><span lang="" about="/freeling/user/63" typeof="schema:Person" property="schema:name" datatype="">lluisp</span></p> <p class="comment__time">Thu, 12/10/2020 - 15:20</p> <p class="comment__permalink"><a href="/freeling/comment/730#comment-730" hreflang="en">Permalink</a></p> </footer> <div class="comment__content"> <h3><a href="/freeling/comment/730#comment-730" class="permalink" rel="bookmark" hreflang="en">it may just be it was…</a></h3> <div class="clearfix text-formatted field field--name-comment-body field--type-text-long field--label-hidden field__item"><p>it may just be it was deleted by mistake.</p> <p>If you improve the dictionary entries in <code>data/pt/dictionary/entries</code>, you're welcome to pull request them</p> <p>thanks!</p></div> <drupal-render-placeholder callback="comment.lazy_builders:renderLinks" arguments="0=730&amp;1=default&amp;2=en&amp;3=" token="eCeoTGqfWqyq_kioPG4tJMonNhAWY2dzleKfHGJyRTY"></drupal-render-placeholder> </div> </article> </section> Wed, 09 Dec 2020 18:39:18 +0000 pbholmen 699 at https://nlp.lsi.upc.edu/freeling https://nlp.lsi.upc.edu/freeling/node/699#comments Error in the Portuguese dictionary? https://nlp.lsi.upc.edu/freeling/node/698 <span property="schema:name" class="field field--name-title field--type-string field--label-hidden">Error in the Portuguese dictionary?</span> <span rel="schema:author" class="field field--name-uid field--type-entity-reference field--label-hidden"><span lang="" about="/freeling/user/7679" typeof="schema:Person" property="schema:name" datatype="">pbholmen</span></span> <span property="schema:dateCreated" content="2020-12-09T12:04:28+00:00" class="field field--name-created field--type-created field--label-hidden">Wed, 12/09/2020 - 13:04</span> <div class="field field--name-taxonomy-forums field--type-entity-reference field--label-above"> <div class="field__label">Forums</div> <div class="field__item"><a href="/freeling/taxonomy/term/4" hreflang="en">Linguistic Data</a></div> </div> <div property="schema:text" class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><p>Hi</p> <p>I think there might be an error in the dictionary found under:<br /> <code>share/freeling/pt/dicc.src</code></p> <p>Line 232708 reads:</p> <p><code>dar dar VMN0000 dar VMN01S0 dar VMN03S0 dar VMSF1S0 dar VMSF3S0</code></p> <p>I believe this means that "dar" is supposed to be the future subjunctive of the verb "dar" for third and first person singular. This would be the case if the verb was regular, however it is irregular. Here is an authoritative source for the correct conjugation:</p> <p><a href="http://www.portaldalinguaportuguesa.org/index.php?action=lemma&amp;lemma=35510">http://www.portaldalinguaportuguesa.org/index.php?action=lemma&amp;lemma=35…</a></p> <p>I don't know enough about Freeling development to create a pull request myself, in fact I even seem to recall the file is generated during install(?), however I have made a diff between the installed dicc.src, (Portuguese, Freeling version 4.2.1), and my own where I have corrected it:</p> <p><code><br /> 232708c232708<br /> &lt; dar dar VMN0000 dar VMN01S0 dar VMN03S0 dar VMSF1S0 dar VMSF3S0<br /> ---<br /> &gt; dar dar VMN0000 dar VMN01S0 dar VMN03S0<br /> 232831c232831<br /> &lt; dardes dar VMN02P0 dar VMSF2P0 dardar VMM02S0 dardar VMSP2S0<br /> ---<br /> &gt; dardes dar VMN02P0 dardar VMM02S0 dardar VMSP2S0<br /> 232855c232855<br /> &lt; darem dar VMN03P0 dar VMSF3P0<br /> ---<br /> &gt; darem dar VMN03P0<br /> 232857c232857<br /> &lt; dares dar VMN02S0 dar VMSF2S0 dares NCMP000<br /> ---<br /> &gt; dares dar VMN02S0 dares NCMP000<br /> 232864c232864<br /> &lt; darmos dar VMN01P0 dar VMSF1P0<br /> ---<br /> &gt; darmos dar VMN01P0<br /> 244702a244703<br /> &gt; der dar VMSF1S0 dar VMSF3S0<br /> 244711a244713,244715<br /> &gt; derdes dar VMSF2P0<br /> &gt; derem dar VMSF3P0<br /> &gt; deres dar VMSF2S0<br /> 244831a244836<br /> &gt; dermos dar VMSF1P0<br /> </code></p> <p>I hope this comes out correctly in the post. More info: I'm using the Homebrew installation of Freeling on macOS.</p> </div> <section class="field field--name-comment-forum field--type-comment field--label-hidden comment-wrapper"> <article role="article" data-comment-user-id="63" id="comment-729" class="comment js-comment clearfix"> <span class="hidden" data-comment-timestamp="1607609895"></span> <footer class="comment__meta"> <article typeof="schema:Person" about="/freeling/user/63" class="profile"> </article> <p class="comment__author"><span lang="" about="/freeling/user/63" typeof="schema:Person" property="schema:name" datatype="">lluisp</span></p> <p class="comment__time">Thu, 12/10/2020 - 15:18</p> <p class="comment__permalink"><a href="/freeling/comment/729#comment-729" hreflang="en">Permalink</a></p> </footer> <div class="comment__content"> <h3><a href="/freeling/comment/729#comment-729" class="permalink" rel="bookmark" hreflang="en">these files are generated…</a></h3> <div class="clearfix text-formatted field field--name-comment-body field--type-text-long field--label-hidden field__item"><p>these files are generated during install, putting toghether the entries in data/pt/dictionary/entries/verbs</p> <p>If you update that file, you can do a pull request</p> <p>However, make sure you do not remove forms that are valid in some Protuguese variant (Portugal or Brasil)</p> <p>  thanks!</p></div> <drupal-render-placeholder callback="comment.lazy_builders:renderLinks" arguments="0=729&amp;1=default&amp;2=en&amp;3=" token="AgG-IMQvSGSJGrnKWEFyGeuZSOETwzolGyqZEJjFUyM"></drupal-render-placeholder> </div> </article> </section> Wed, 09 Dec 2020 12:04:28 +0000 pbholmen 698 at https://nlp.lsi.upc.edu/freeling https://nlp.lsi.upc.edu/freeling/node/698#comments Lemas file https://nlp.lsi.upc.edu/freeling/node/695 <span property="schema:name" class="field field--name-title field--type-string field--label-hidden">Lemas file</span> <span rel="schema:author" class="field field--name-uid field--type-entity-reference field--label-hidden"><span lang="" about="/freeling/user/7670" typeof="schema:Person" property="schema:name" datatype="" content="Sebastià Salvà i Puig">Sebastià Salvà…</span></span> <span property="schema:dateCreated" content="2020-11-16T19:21:36+00:00" class="field field--name-created field--type-created field--label-hidden">Mon, 11/16/2020 - 20:21</span> <div class="field field--name-taxonomy-forums field--type-entity-reference field--label-above"> <div class="field__label">Forums</div> <div class="field__item"><a href="/freeling/taxonomy/term/4" hreflang="en">Linguistic Data</a></div> </div> <div property="schema:text" class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><p>Dear all,</p> <p>Do you know how to get, from a dicc.dat/dicc.txt file, a file with all the lemas and all the tokens forms, in oder to load it in a concordance program such as AntConc?</p> <p>casa -&gt; casa, cases<br /> menjar -&gt; menjo, menjava, menjaré, mengin...</p> <p>Could yo recommend me some kind of instruction, such as awk...?</p> </div> <section class="field field--name-comment-forum field--type-comment field--label-hidden comment-wrapper"> <article role="article" data-comment-user-id="7670" id="comment-715" class="comment js-comment by-node-author clearfix"> <span class="hidden" data-comment-timestamp="1605555010"></span> <footer class="comment__meta"> <article typeof="schema:Person" about="/freeling/user/7670" class="profile"> </article> <p class="comment__author"><span lang="" about="/freeling/user/7670" typeof="schema:Person" property="schema:name" datatype="" content="Sebastià Salvà i Puig">Sebastià Salvà…</span></p> <p class="comment__time">Mon, 11/16/2020 - 20:30</p> <p class="comment__permalink"><a href="/freeling/comment/715#comment-715" hreflang="en">Permalink</a></p> </footer> <div class="comment__content"> <h3><a href="/freeling/comment/715#comment-715" class="permalink" rel="bookmark" hreflang="en">Actually, the proper format…</a></h3> <div class="clearfix text-formatted field field--name-comment-body field--type-text-long field--label-hidden field__item"><p>Actually, the proper format should be:</p> <p>casa TAB -&gt; TAB casa TAB cases<br /> menjar TAB -&gt; TAB menjo TAB menjava TAB menjaré TAB mengin</p> <p>(etcetera...)</p> </div> <drupal-render-placeholder callback="comment.lazy_builders:renderLinks" arguments="0=715&amp;1=default&amp;2=en&amp;3=" token="eACKh18Ec5QWyJgMf3f2nPgqPB3s5uVdP9rWmzsPNaY"></drupal-render-placeholder> </div> </article> <article role="article" data-comment-user-id="63" id="comment-716" class="comment js-comment clearfix"> <span class="hidden" data-comment-timestamp="1605686790"></span> <footer class="comment__meta"> <article typeof="schema:Person" about="/freeling/user/63" class="profile"> </article> <p class="comment__author"><span lang="" about="/freeling/user/63" typeof="schema:Person" property="schema:name" datatype="">lluisp</span></p> <p class="comment__time">Wed, 11/18/2020 - 09:06</p> <p class="comment__permalink"><a href="/freeling/comment/716#comment-716" hreflang="en">Permalink</a></p> </footer> <div class="comment__content"> <h3><a href="/freeling/comment/716#comment-716" class="permalink" rel="bookmark" hreflang="en">Once you install FreeLing, a…</a></h3> <div class="clearfix text-formatted field field--name-comment-body field--type-text-long field--label-hidden field__item"><p>Once you install FreeLing, a file is created in /usr/local/share/freeling/ca/dicc.src, which contains something very close to what you ask, and that should be easy to adapt with a simple awk command, a small python program, or even loading the file in a spreadsheet</p></div> <drupal-render-placeholder callback="comment.lazy_builders:renderLinks" arguments="0=716&amp;1=default&amp;2=en&amp;3=" token="5EaF0_jnZXDHBxgGFqILdS1DRIuUfor_quKORD_l2Qo"></drupal-render-placeholder> </div> </article> <article role="article" data-comment-user-id="7670" id="comment-723" class="comment js-comment by-node-author clearfix"> <span class="hidden" data-comment-timestamp="1606163492"></span> <footer class="comment__meta"> <article typeof="schema:Person" about="/freeling/user/7670" class="profile"> </article> <p class="comment__author"><span lang="" about="/freeling/user/7670" typeof="schema:Person" property="schema:name" datatype="" content="Sebastià Salvà i Puig">Sebastià Salvà…</span></p> <p class="comment__time">Mon, 11/23/2020 - 21:31</p> <p class="comment__permalink"><a href="/freeling/comment/723#comment-723" hreflang="en">Permalink</a></p> </footer> <div class="comment__content"> <h3><a href="/freeling/comment/723#comment-723" class="permalink" rel="bookmark" hreflang="en">Yes, an AWK command would be…</a></h3> <div class="clearfix text-formatted field field--name-comment-body field--type-text-long field--label-hidden field__item"><p>Yes, an AWK command would be great. But I have no idea of which would be the one.</p> </div> <drupal-render-placeholder callback="comment.lazy_builders:renderLinks" arguments="0=723&amp;1=default&amp;2=en&amp;3=" token="FlueOoCAQJr97TbV-PU1j4KJIvmu45QpapvmshFMV4U"></drupal-render-placeholder> </div> </article> <article role="article" data-comment-user-id="63" id="comment-724" class="comment js-comment clearfix"> <span class="hidden" data-comment-timestamp="1606316091"></span> <footer class="comment__meta"> <article typeof="schema:Person" about="/freeling/user/63" class="profile"> </article> <p class="comment__author"><span lang="" about="/freeling/user/63" typeof="schema:Person" property="schema:name" datatype="">lluisp</span></p> <p class="comment__time">Wed, 11/25/2020 - 15:54</p> <p class="comment__permalink"><a href="/freeling/comment/724#comment-724" hreflang="en">Permalink</a></p> </footer> <div class="comment__content"> <h3><a href="/freeling/comment/724#comment-724" class="permalink" rel="bookmark" hreflang="en">AWK is an option for what…</a></h3> <div class="clearfix text-formatted field field--name-comment-body field--type-text-long field--label-hidden field__item"><p>AWK is an option for what toy want to do, not a requirement.</p> <p>If you don't know AWK, you can use a small python script.  Or Perl.  Or any other programming language</p> <p>If you are not a programmer, you probably can achieve the same loading the data in a spreadsheet like excel or openoffice.</p> <p> </p> <p> </p></div> <drupal-render-placeholder callback="comment.lazy_builders:renderLinks" arguments="0=724&amp;1=default&amp;2=en&amp;3=" token="VKr-p0RwFxukGc8Cx0fHFzxdTQWQ6Q26YqVgdcm9QM0"></drupal-render-placeholder> </div> </article> </section> Mon, 16 Nov 2020 19:21:36 +0000 Sebastià Salvà i Puig 695 at https://nlp.lsi.upc.edu/freeling https://nlp.lsi.upc.edu/freeling/node/695#comments Personal articles in Catalan: "En Pere" & "na Maria" https://nlp.lsi.upc.edu/freeling/node/694 <span property="schema:name" class="field field--name-title field--type-string field--label-hidden">Personal articles in Catalan: &quot;En Pere&quot; &amp; &quot;na Maria&quot;</span> <span rel="schema:author" class="field field--name-uid field--type-entity-reference field--label-hidden"><span lang="" about="/freeling/user/7670" typeof="schema:Person" property="schema:name" datatype="" content="Sebastià Salvà i Puig">Sebastià Salvà…</span></span> <span property="schema:dateCreated" content="2020-11-16T19:14:22+00:00" class="field field--name-created field--type-created field--label-hidden">Mon, 11/16/2020 - 20:14</span> <div class="field field--name-taxonomy-forums field--type-entity-reference field--label-above"> <div class="field__label">Forums</div> <div class="field__item"><a href="/freeling/taxonomy/term/4" hreflang="en">Linguistic Data</a></div> </div> <div property="schema:text" class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><p>Dear all,</p> <p>Any idea or suggestion for modifying some catalan FreeLing language DAT file(s) in order to get the proper tokenization and labeling for the personal articles before a proper a name in Catalan?</p> <p>For instance: "en Pere" should be labeled "en DA0MS | Pere NP00SP0", not as a preposicional phrase ("en SP | Pere NP00SP0"). And "na Maria" should be tokenized in two words and tagged such as "na DA0FS | Maria_NP, not as "na_Maria_NP00SP0" (with a "_" merging the article and the proper name).</p> </div> <section class="field field--name-comment-forum field--type-comment field--label-hidden comment-wrapper"> <article role="article" data-comment-user-id="63" id="comment-717" class="comment js-comment clearfix"> <span class="hidden" data-comment-timestamp="1605687692"></span> <footer class="comment__meta"> <article typeof="schema:Person" about="/freeling/user/63" class="profile"> </article> <p class="comment__author"><span lang="" about="/freeling/user/63" typeof="schema:Person" property="schema:name" datatype="">lluisp</span></p> <p class="comment__time">Wed, 11/18/2020 - 09:21</p> <p class="comment__permalink"><a href="/freeling/comment/717#comment-717" hreflang="en">Permalink</a></p> </footer> <div class="comment__content"> <h3><a href="/freeling/comment/717#comment-717" class="permalink" rel="bookmark" hreflang="en">The tagger is statistical,…</a></h3> <div class="clearfix text-formatted field field--name-comment-body field--type-text-long field--label-hidden field__item"><p>The tagger is statistical, and the frequency of "en" as a DA is 0.08%, which makes it very unlikely that this reading is selected.</p> <p>However, you may try biassing that decision as follows:</p> <p>edit the file /usr/local/share/freeling/ca/tagger.dat, and in section &lt;Forbidden&gt; add a new line:</p> <p> *.SP&lt;en&gt;.NP</p> <p>However, be aware that this will cause that "en" is *never* tagged as a preposition before a NP, which migth introduce errors in other sentences.</p> <p>Regarding "Na_Marta", that happens because the default NE recognizer is also a ML model, and has probably not seen that case often.  You can solve that using an alternative NE recognizer:  Add "--fnp /usr/local/share/freeling/ca/np.dat" to your analyzer command.</p> <p>However: If "Na" is capitalized and not at the beginning of the sentence, it will be considered part of the NP by any NER system.</p> <p> </p></div> <drupal-render-placeholder callback="comment.lazy_builders:renderLinks" arguments="0=717&amp;1=default&amp;2=en&amp;3=" token="PsIsgo8gVLgrWda9h8Xfnwe_35l-G18TkjqmkdjKe0c"></drupal-render-placeholder> </div> </article> <article role="article" data-comment-user-id="7670" id="comment-722" class="comment js-comment by-node-author clearfix"> <span class="hidden" data-comment-timestamp="1606163341"></span> <footer class="comment__meta"> <article typeof="schema:Person" about="/freeling/user/7670" class="profile"> </article> <p class="comment__author"><span lang="" about="/freeling/user/7670" typeof="schema:Person" property="schema:name" datatype="" content="Sebastià Salvà i Puig">Sebastià Salvà…</span></p> <p class="comment__time">Mon, 11/23/2020 - 21:29</p> <p class="comment__permalink"><a href="/freeling/comment/722#comment-722" hreflang="en">Permalink</a></p> </footer> <div class="comment__content"> <h3><a href="/freeling/comment/722#comment-722" class="permalink" rel="bookmark" hreflang="en">Thanks for your answer…</a></h3> <div class="clearfix text-formatted field field--name-comment-body field--type-text-long field--label-hidden field__item"><p>Thanks for your answer.</p> <p>Could I also try to introduce this other exception: "*.SP.NP00SP0", in order just to exclude the person proper names (not the place proper names)?</p> </div> <drupal-render-placeholder callback="comment.lazy_builders:renderLinks" arguments="0=722&amp;1=default&amp;2=en&amp;3=" token="p5JiAyOig2LD_K7dJWCJreFkwW7T-k3EFPyOMprD3Mc"></drupal-render-placeholder> </div> </article> </section> Mon, 16 Nov 2020 19:14:22 +0000 Sebastià Salvà i Puig 694 at https://nlp.lsi.upc.edu/freeling https://nlp.lsi.upc.edu/freeling/node/694#comments Demo is not working https://nlp.lsi.upc.edu/freeling/node/654 <span property="schema:name" class="field field--name-title field--type-string field--label-hidden">Demo is not working</span> <span rel="schema:author" class="field field--name-uid field--type-entity-reference field--label-hidden"><span lang="" about="/freeling/user/7299" typeof="schema:Person" property="schema:name" datatype="">Sara_Saad</span></span> <span property="schema:dateCreated" content="2018-07-24T13:12:33+00:00" class="field field--name-created field--type-created field--label-hidden">Tue, 07/24/2018 - 15:12</span> <div class="field field--name-taxonomy-forums field--type-entity-reference field--label-above"> <div class="field__label">Forums</div> <div class="field__item"><a href="/freeling/taxonomy/term/4" hreflang="en">Linguistic Data</a></div> </div> <div property="schema:text" class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><p>Hi All, </p> <p>Please I need your kind assistance as the online demo is not working. I have about 5 days trying and I always have the below error message: </p> <p>Fatal error: Uncaught Exception: String could not be parsed as XML in /home/operador/public_html/freeling-8.5.0/demo/demo.php:265 Stack trace: #0 /home/operador/public_html/freeling-8.5.0/demo/demo.php(265): SimpleXMLElement-&gt;__construct('') #1 {main} thrown in /home/operador/public_html/freeling-8.5.0/demo/demo.php on line 265</p> <p>would you please advise?</p> <p>Thanks you!<br /> Sara</p> </div> <section class="field field--name-comment-forum field--type-comment field--label-hidden comment-wrapper"> <article role="article" data-comment-user-id="63" id="comment-629" class="comment js-comment clearfix"> <span class="hidden" data-comment-timestamp="1532945563"></span> <footer class="comment__meta"> <article typeof="schema:Person" about="/freeling/user/63" class="profile"> </article> <p class="comment__author"><span lang="" about="/freeling/user/63" typeof="schema:Person" property="schema:name" datatype="">lluisp</span></p> <p class="comment__time">Mon, 07/30/2018 - 12:12</p> <p class="comment__permalink"><a href="/freeling/comment/629#comment-629" hreflang="en">Permalink</a></p> </footer> <div class="comment__content"> <h3><a href="/freeling/comment/629#comment-629" class="permalink" rel="bookmark" hreflang="en">Our servers were down some…</a></h3> <div class="clearfix text-formatted field field--name-comment-body field--type-text-long field--label-hidden field__item"><p>Our servers were down some days for summer maintenance.</p> <p>It should be working now  (though there may be other cuts during August)</p></div> <drupal-render-placeholder callback="comment.lazy_builders:renderLinks" arguments="0=629&amp;1=default&amp;2=en&amp;3=" token="NnIqo1FE9UtRpXKr5gzoERVlraBi6Iplxv_70AZKgNc"></drupal-render-placeholder> </div> </article> </section> Tue, 24 Jul 2018 13:12:33 +0000 Sara_Saad 654 at https://nlp.lsi.upc.edu/freeling https://nlp.lsi.upc.edu/freeling/node/654#comments Example 07: Extracting Triples with Semantic Information https://nlp.lsi.upc.edu/freeling/node/652 <span property="schema:name" class="field field--name-title field--type-string field--label-hidden">Example 07: Extracting Triples with Semantic Information</span> <span rel="schema:author" class="field field--name-uid field--type-entity-reference field--label-hidden"><span lang="" about="/freeling/user/7303" typeof="schema:Person" property="schema:name" datatype="">kashyap</span></span> <span property="schema:dateCreated" content="2018-07-09T12:51:18+00:00" class="field field--name-created field--type-created field--label-hidden">Mon, 07/09/2018 - 14:51</span> <div class="field field--name-taxonomy-forums field--type-entity-reference field--label-above"> <div class="field__label">Forums</div> <div class="field__item"><a href="/freeling/taxonomy/term/4" hreflang="en">Linguistic Data</a></div> </div> <div property="schema:text" class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><p>As a part of my current task, I am trying to feed(/extracted) knowledge(/triples) from Freeling to the Ontology. I did go through Freeling tutorials and I did try to run the example 7(<a href="https://talp-upc.gitbooks.io/freeling-tutorial/content/code/example07.py.html">https://talp-upc.gitbooks.io/freeling-tutorial/content/code/example07.p…</a>) but when I run the programme it does gives below error:</p> <p>FREELINGDIR environment variable not defined, trying /usr/local<br /> Text language is: en<br /> Traceback (most recent call last):<br /> File "pyfreeling_triple_extraction.py", line 172, in<br /> ProcessSentences(ls, sdb)<br /> File "pyfreeling_triple_extraction.py", line 46, in ProcessSentences<br /> if lsubj!="" and ldobj!="" :<br /> UnboundLocalError: local variable 'lsubj' referenced before assignment</p> <p>Even if I comment those lines and only try to print get_predicates() it gives "Text language is: en" as output nothing else.</p> <p>Here, goes my code:</p> <p>#! /usr/bin/python</p> <p>import pyfreeling<br /> import sys, os</p> <p>##---------------------------------------------<br /> ## Extract lemma and sense of word 'w' and store them<br /> ## in 'lem' and 'sens' respectively<br /> ##---------------------------------------------<br /> def extract_lemma_and_sense(w) :<br /> lem = w.get_lemma()<br /> sens=""<br /> if len(w.get_senses())&gt;0 :<br /> sens = w.get_senses()[0][0]<br /> return lem, sens</p> <p>## -----------------------------------------------<br /> ## Do whatever is needed with analyzed sentences<br /> ## -----------------------------------------------<br /> def ProcessSentences(ls, sdb) :</p> <p> # for each sentence in list<br /> for s in ls :</p> <p> # for each predicate in sentence<br /> for pred in s.get_predicates() :<br /> lsubj=""; ssubj=""; ldobj=""; sdobj=""<br /> # for each argument of the predicate<br /> for arg in pred :<br /> # if the argument is A1, store lemma and synset in ldobj, sdobj<br /> if arg.get_role()=="A1" :<br /> (ldobj,sdobj) = extract_lemma_and_sense(s[arg.get_position()])<br /> # if the argument is A0, store lemma and synset in lsubj, subj<br /> elif arg.get_role()=="A0" :<br /> (lsubj,ssubj) = extract_lemma_and_sense(s[arg.get_position()])<br /> # Get tree node corresponding to the word marked as argument head<br /> head = s.get_dep_tree().get_node_by_pos(arg.get_position())<br /> # check if the node has dependency is "by" in passive structure<br /> if lsubj=="by" and head.get_label=="LGS" :<br /> # get first (and only) child, and use it as actual subject<br /> head = head.get_nth_child(0)<br /> (lsubj,ssubj) = extract_lemma_and_sense(head.get_word()) </p> <p> #if the predicate had both A0 and A1, we found a complete SVO triple. Let's output it.<br /> if lsubj!="" and ldobj!="" :<br /> (lpred,spred) = extract_lemma_and_sense(s[pred.get_position()])<br /> # if we found a synset for the predicate, obtain lemma synonyms and SUMO link<br /> if (spred!="") :<br /> ipred = sdb.get_sense_info(spred);<br /> lpred = "/".join(ipred.words) + " [" + ipred.sumo + "]"<br /> # if we found a synset for the subject, obtain lemma synonyms and SUMO link<br /> if (ssubj!="") :<br /> isubj = sdb.get_sense_info(ssubj);<br /> lsubj = "/".join(isubj.words) + " [" + isubj.sumo + "]"</p> <p> # if we found a synset for the object, obtain lemma synonyms and SUMO link<br /> if (sdobj!="") :<br /> idobj = sdb.get_sense_info(sdobj);<br /> ldobj = "/".join(idobj.words) + " [" + idobj.sumo + "]"</p> <p> print ("SVO : (pred: " , lpred, "[" + spred + "]")<br /> print (" subject:" , lsubj, "[" + ssubj + "]")<br /> print (" dobject:" , ldobj, "[" + sdobj + "]")<br /> print (" )")</p> <p>## -----------------------------------------------<br /> ## Set desired options for morphological analyzer<br /> ## -----------------------------------------------<br /> def my_maco_options(lang) :</p> <p> lpath = DATA + LANG + "/"</p> <p> # create options holder<br /> opt = pyfreeling.maco_options(lang);</p> <p> # Provide files for morphological submodules. Note that it is not<br /> # necessary to set file for modules that will not be used.<br /> opt.UserMapFile = "";<br /> opt.LocutionsFile = lpath + "locucions.dat";<br /> opt.AffixFile = lpath + "afixos.dat";<br /> opt.ProbabilityFile = lpath + "probabilitats.dat";<br /> opt.DictionaryFile = lpath + "dicc.src";<br /> opt.NPdataFile = lpath + "np.dat";<br /> opt.PunctuationFile = lpath + "../common/punct.dat";<br /> return opt;</p> <p>## ----------------------------------------------<br /> ## ------------- MAIN PROGRAM ---------------<br /> ## ----------------------------------------------</p> <p>## Check whether we know where to find FreeLing data files<br /> if "FREELINGDIR" not in os.environ :<br /> if sys.platform == "win32" or sys.platform == "win64" : os.environ["FREELINGDIR"] = "C:\\Program Files"<br /> else : os.environ["FREELINGDIR"] = "/usr/local"<br /> print &gt;&gt; sys.stderr, "FREELINGDIR environment variable not defined, trying ", os.environ["FREELINGDIR"]</p> <p>if not os.path.exists(os.environ["FREELINGDIR"]+"/share/freeling") :<br /> print &gt;&gt; sys.stderr, "Folder",os.environ["FREELINGDIR"]+"/share/freeling", "not found.\nPlease set FREELINGDIR environment variable to FreeLing installation directory"<br /> sys.exit(1)</p> <p># Location of FreeLing configuration files.<br /> DATA = os.environ["FREELINGDIR"]+"/share/freeling/";</p> <p># Init locales<br /> pyfreeling.util_init_locale("default");</p> <p># create language detector. Used just to show it. Results are printed<br /> # but ignored (after, it is assumed language is LANG)<br /> la=pyfreeling.lang_ident(DATA+"common/lang_ident/ident-few.dat");</p> <p># create options set for maco analyzer. Default values are Ok, except for data files.<br /> LANG="en";</p> <p>op= pyfreeling.maco_options(LANG);<br /> op.set_data_files( "",<br /> DATA + "common/punct.dat",<br /> DATA + LANG + "/dicc.src",<br /> DATA + LANG + "/afixos.dat",<br /> "",<br /> DATA + LANG + "/locucions.dat",<br /> DATA + LANG + "/np.dat",<br /> DATA + LANG + "/quantities.dat",<br /> DATA + LANG + "/probabilitats.dat");</p> <p># create analyzers<br /> tk=pyfreeling.tokenizer(DATA+LANG+"/tokenizer.dat");<br /> sp=pyfreeling.splitter(DATA+LANG+"/splitter.dat");<br /> sid=sp.open_session();<br /> mf=pyfreeling.maco(op);</p> <p># activate mmorpho odules to be used in next call<br /> mf.set_active_options(False, True, True, True, # select which among created<br /> True, True, False, True, # submodules are to be used.<br /> True, True, True, True ); # default: all created submodules are used</p> <p># create tagger, sense anotator, and parsers<br /> tg=pyfreeling.hmm_tagger(DATA+LANG+"/tagger.dat",True,2);<br /> sen=pyfreeling.senses(DATA+LANG+"/senses.dat");<br /> parser= pyfreeling.chart_parser(DATA+LANG+"/chunker/grammar-chunk.dat");<br /> dep=pyfreeling.dep_txala(DATA+LANG+"/dep_txala/dependences.dat", parser.get_start_symbol());</p> <p># create semantic DB module<br /> sdb = pyfreeling.semanticDB(DATA+LANG+"/semdb.dat");</p> <p># process input text<br /> lin=sys.stdin.readline();</p> <p>print "Text language is: "+la.identify_language(lin)</p> <p>while (lin) :</p> <p> l = tk.tokenize(lin);<br /> ls = sp.split(sid,l,False);</p> <p> ls = mf.analyze(ls);<br /> ls = tg.analyze(ls);<br /> ls = sen.analyze(ls);<br /> ls = parser.analyze(ls);<br /> ls = dep.analyze(ls);</p> <p> lin=sys.stdin.readline();</p> <p> # do whatever is needed with processed sentences<br /> ProcessSentences(ls, sdb)</p> <p># clean up<br /> sp.close_session(sid);</p> </div> <section class="field field--name-comment-forum field--type-comment field--label-hidden comment-wrapper"> <article role="article" data-comment-user-id="7303" id="comment-624" class="comment js-comment by-node-author clearfix"> <span class="hidden" data-comment-timestamp="1531155222"></span> <footer class="comment__meta"> <article typeof="schema:Person" about="/freeling/index.php/user/7303" class="profile"> </article> <p class="comment__author"><span lang="" about="/freeling/user/7303" typeof="schema:Person" property="schema:name" datatype="">kashyap</span></p> <p class="comment__time">Mon, 07/09/2018 - 18:53</p> <p class="comment__permalink"><a href="/freeling/comment/624#comment-624" hreflang="en">Permalink</a></p> </footer> <div class="comment__content"> <h3><a href="/freeling/comment/624#comment-624" class="permalink" rel="bookmark" hreflang="en">it is resolved. but when we…</a></h3> <div class="clearfix text-formatted field field--name-comment-body field--type-text-long field--label-hidden field__item"><p>it is resolved. but when we have a sentence like A0, AM-TMP and AM-LOC it does not generate any output.</p> </div> <drupal-render-placeholder callback="comment.lazy_builders:renderLinks" arguments="0=624&amp;1=default&amp;2=en&amp;3=" token="z8h0Q9_fj0Q2USdvtspLTQNxC_iJZwi4OGxxt9rV_t4"></drupal-render-placeholder> </div> </article> <article role="article" data-comment-user-id="63" id="comment-625" class="comment js-comment clearfix"> <span class="hidden" data-comment-timestamp="1531305961"></span> <footer class="comment__meta"> <article typeof="schema:Person" about="/freeling/user/63" class="profile"> </article> <p class="comment__author"><span lang="" about="/freeling/user/63" typeof="schema:Person" property="schema:name" datatype="">lluisp</span></p> <p class="comment__time">Wed, 07/11/2018 - 12:46</p> <p class="comment__permalink"><a href="/freeling/comment/625#comment-625" hreflang="en">Permalink</a></p> </footer> <div class="comment__content"> <h3><a href="/freeling/comment/625#comment-625" class="permalink" rel="bookmark" hreflang="en">Of course it doesn&#039;t. The…</a></h3> <div class="clearfix text-formatted field field--name-comment-body field--type-text-long field--label-hidden field__item"><p>Of course it doesn't. The program extracts SVO triples, which means it looks for sentences with A0 *and* A1.  If the sentence has only A0, there is no direct object, thus no triple is extracted.</p> <p>However, this program is just an example. You should adapt the code to your needs to output whatever you want.</p></div> <drupal-render-placeholder callback="comment.lazy_builders:renderLinks" arguments="0=625&amp;1=default&amp;2=en&amp;3=" token="dCFZ1gCjOAGlAeUWVM4DPgZXguvfpXADdXx5izxMM9g"></drupal-render-placeholder> </div> </article> <article role="article" data-comment-user-id="7301" id="comment-650" class="comment js-comment clearfix"> <span class="hidden" data-comment-timestamp="1541546750"></span> <footer class="comment__meta"> <article typeof="schema:Person" about="/freeling/index.php/user/7301" class="profile"> </article> <p class="comment__author"><span lang="" about="/freeling/user/7301" typeof="schema:Person" property="schema:name" datatype="">kondra</span></p> <p class="comment__time">Wed, 11/07/2018 - 00:25</p> <p class="comment__permalink"><a href="/freeling/comment/650#comment-650" hreflang="en">Permalink</a></p> </footer> <div class="comment__content"> <h3><a href="/freeling/comment/650#comment-650" class="permalink" rel="bookmark" hreflang="en">I have the same issue here…</a></h3> <div class="clearfix text-formatted field field--name-comment-body field--type-text-long field--label-hidden field__item"><p>I have the same issue here.<br /> get_predicates() method returns an empty list. It has been solved? Do I have to clone and re-compile Freeling again?</p> </div> <drupal-render-placeholder callback="comment.lazy_builders:renderLinks" arguments="0=650&amp;1=default&amp;2=en&amp;3=" token="Pf5Fm7uZCs9D9yLyqGMV7RQ7w6o6ZtToP_Cw3Sn6cuE"></drupal-render-placeholder> </div> </article> <article role="article" data-comment-user-id="63" id="comment-651" class="comment js-comment clearfix"> <span class="hidden" data-comment-timestamp="1541604396"></span> <footer class="comment__meta"> <article typeof="schema:Person" about="/freeling/user/63" class="profile"> </article> <p class="comment__author"><span lang="" about="/freeling/user/63" typeof="schema:Person" property="schema:name" datatype="">lluisp</span></p> <p class="comment__time">Wed, 11/07/2018 - 16:25</p> <p class="comment__permalink"><a href="/freeling/comment/651#comment-651" hreflang="en">Permalink</a></p> </footer> <div class="comment__content"> <h3><a href="/freeling/comment/651#comment-651" class="permalink" rel="bookmark" hreflang="en">get_predicates will only…</a></h3> <div class="clearfix text-formatted field field--name-comment-body field--type-text-long field--label-hidden field__item"><p>get_predicates will only return something if you called the SRL</p> <p>If you are using version 4.1, that task is performed by dep_treeler module (and not for all languages, so check that your target language has this feature available).</p> <p>If you are using master version, this has changed, and SRL is a separate module that needs to be called explicitily.</p> <p>Check details for SRL module in 'master' version of the manual.<br /> <a href="https://talp-upc.gitbook.io/freeling-4-1-user-manual/v/master/">https://talp-upc.gitbook.io/freeling-4-1-user-manual/v/master/</a></p> </div> <drupal-render-placeholder callback="comment.lazy_builders:renderLinks" arguments="0=651&amp;1=default&amp;2=en&amp;3=" token="M3jwWxVrKMUrNCZk7ozyzLkYQVB9Vjcwj3rNBX7KagY"></drupal-render-placeholder> </div> </article> </section> Mon, 09 Jul 2018 12:51:18 +0000 kashyap 652 at https://nlp.lsi.upc.edu/freeling https://nlp.lsi.upc.edu/freeling/node/652#comments Different behavior with Catalan numbers https://nlp.lsi.upc.edu/freeling/node/617 <span property="schema:name" class="field field--name-title field--type-string field--label-hidden">Different behavior with Catalan numbers</span> <span rel="schema:author" class="field field--name-uid field--type-entity-reference field--label-hidden"><span lang="" about="/freeling/user/6017" typeof="schema:Person" property="schema:name" datatype="">jmfc90</span></span> <span property="schema:dateCreated" content="2017-02-10T12:03:44+00:00" class="field field--name-created field--type-created field--label-hidden">Fri, 02/10/2017 - 13:03</span> <div class="field field--name-taxonomy-forums field--type-entity-reference field--label-above"> <div class="field__label">Forums</div> <div class="field__item"><a href="/freeling/taxonomy/term/4" hreflang="en">Linguistic Data</a></div> </div> <div property="schema:text" class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><p>Hello.<br /> I am having some troubles working with numbers in Catalan. In your demo page I use the text "tres mil" with Catalan language and using the default configuration and I get the following output:<br /> <code><br /> tres_mil<br /> 3000<br /> Z<br /> </code></p> <p>But If I use the text "nou mil" with exactly the same configuration I get the output:<br /> <code><br /> nou mil<br /> nou 1000<br /> AQ0MS00 Z<br /> </code></p> <p>Why in the second example did I not get the same output structure with one entity with the complete number (9000) as in the first example?<br /> Thanks.</p> </div> <section class="field field--name-comment-forum field--type-comment field--label-hidden comment-wrapper"> <article role="article" data-comment-user-id="63" id="comment-394" class="comment js-comment clearfix"> <span class="hidden" data-comment-timestamp="1486747009"></span> <footer class="comment__meta"> <article typeof="schema:Person" about="/freeling/user/63" class="profile"> </article> <p class="comment__author"><span lang="" about="/freeling/user/63" typeof="schema:Person" property="schema:name" datatype="">lluisp</span></p> <p class="comment__time">Fri, 02/10/2017 - 18:16</p> <p class="comment__permalink"><a href="/freeling/comment/394#comment-394" hreflang="en">Permalink</a></p> </footer> <div class="comment__content"> <h3><a href="/freeling/comment/394#comment-394" class="permalink" rel="bookmark" hreflang="en">that is an error</a></h3> <div class="clearfix text-formatted field field--name-comment-body field--type-text-long field--label-hidden field__item"><p>That is caused by "nou" being interpreted as an adjective instead of number.<br /> This happens in version 4.0 stable, which is the one used by the demo,</p> <p>It is fixed in the latest development version.</p> </div> <drupal-render-placeholder callback="comment.lazy_builders:renderLinks" arguments="0=394&amp;1=default&amp;2=en&amp;3=" token="G2lBva5jZq9fVVdjIOSjgeqzDu-OoDY3Gy_igLIo9BU"></drupal-render-placeholder> </div> </article> </section> Fri, 10 Feb 2017 12:03:44 +0000 jmfc90 617 at https://nlp.lsi.upc.edu/freeling https://nlp.lsi.upc.edu/freeling/node/617#comments NER and IOB sequential classification https://nlp.lsi.upc.edu/freeling/node/601 <span property="schema:name" class="field field--name-title field--type-string field--label-hidden">NER and IOB sequential classification</span> <span rel="schema:author" class="field field--name-uid field--type-entity-reference field--label-hidden"><span lang="" about="/freeling/user/5492" typeof="schema:Person" property="schema:name" datatype="">flopezbello</span></span> <span property="schema:dateCreated" content="2016-11-20T18:27:32+00:00" class="field field--name-created field--type-created field--label-hidden">Sun, 11/20/2016 - 19:27</span> <div class="field field--name-taxonomy-forums field--type-entity-reference field--label-above"> <div class="field__label">Forums</div> <div class="field__item"><a href="/freeling/taxonomy/term/4" hreflang="en">Linguistic Data</a></div> </div> <div property="schema:text" class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><p>Hello. </p> <p>Consider the following statement:<br /> "La sentencia dictada por la Sra. Juez Letrado de Primera Instancia de Montevideo"</p> <p>When you use Freeling to parse for NER/NEC, you get:</p> <p>1 La el DA0FS0 DA pos=determiner|type=article|gen=feminine|num=singular - - - - - - -<br /> 2 sentencia sentencia NCFS000 NC pos=noun|type=common|gen=feminine|num=singular - - - - - - -<br /> 3 dictada dictar VMP00SF VMP pos=verb|type=main|mood=participle|num=singular|gen=feminine - - - - - - -<br /> 4 por por SP SP pos=adposition|type=preposition - - - - - - -<br /> 5 la el DA0FS0 DA pos=determiner|type=article|gen=feminine|num=singular - - - - - - -<br /> 6 Sra._Juez_Letrado_de_Primera_Instancia_de_el_Chuy sra._juez_letrado_de_primera_instancia_de_el_chuy NP00O00 NP pos=noun|type=proper|neclass=organization B-ORG - - - - - -</p> <p>which is not correct for line 6. One would expect, for line 6, something like:</p> <p>Sra. B-PER<br /> Juez I-PER<br /> Letrado I-PER<br /> de I-PER<br /> Primera I-PER<br /> Instancia I-PER<br /> de<br /> Montevideo B-LOC</p> <p>I've been trying to tune configuration files tw*dat and gen*dat with no luck.</p> <p>Any ideas? Am I missing something?</p> <p>Thanks!</p> </div> <section class="field field--name-comment-forum field--type-comment field--label-hidden comment-wrapper"> <article role="article" data-comment-user-id="63" id="comment-231" class="comment js-comment clearfix"> <span class="hidden" data-comment-timestamp="1479885928"></span> <footer class="comment__meta"> <article typeof="schema:Person" about="/freeling/user/63" class="profile"> </article> <p class="comment__author"><span lang="" about="/freeling/user/63" typeof="schema:Person" property="schema:name" datatype="">lluisp</span></p> <p class="comment__time">Wed, 11/23/2016 - 08:24</p> <p class="comment__permalink"><a href="/freeling/comment/231#comment-231" hreflang="en">Permalink</a></p> </footer> <div class="comment__content"> <h3><a href="/freeling/comment/231#comment-231" class="permalink" rel="bookmark" hreflang="en">FreeLing is a library</a></h3> <div class="clearfix text-formatted field field--name-comment-body field--type-text-long field--label-hidden field__item"><p>Freeling is a library which produces a data structure as a result.<br /> "analyzer" program is just a sample of how to use this library. This program has many options and output formats, but does not have any possible format or imaginable combination.</p> <p>As written in the manual, </p> <p><cite>Thus, the question is not why this program doesn't offer functionality X?, why it doesn't output information Y?, or why it doesn't present results in format Z?, but How should I use FreeLing library to write a program that does exactly what I need?.</cite></p> <p>So, if you need some output or processing not offered by the sample program, you need to write your own main program (or to modify "analyzer" to get it)</p> <p>In your case, if all you need is breaking NEs, you can write a dummy python script (or perl, awk, or whatever you prefer) to do the job.</p> </div> <drupal-render-placeholder callback="comment.lazy_builders:renderLinks" arguments="0=231&amp;1=default&amp;2=en&amp;3=" token="_GnTzqfuDm6M17JqE8aP31qMFFCtVAIR5KVFEKLBXgY"></drupal-render-placeholder> </div> </article> <div class="indented"><article role="article" data-comment-user-id="5492" id="comment-237" class="comment js-comment by-node-author clearfix"> <span class="hidden" data-comment-timestamp="1480197284"></span> <footer class="comment__meta"> <article typeof="schema:Person" about="/freeling/index.php/user/5492" class="profile"> </article> <p class="comment__author"><span lang="" about="/freeling/user/5492" typeof="schema:Person" property="schema:name" datatype="">flopezbello</span></p> <p class="comment__time">Sat, 11/26/2016 - 22:54</p> <p class="comment__permalink"><a href="/freeling/comment/237#comment-237" hreflang="en">Permalink</a></p> <p class="visually-hidden">In reply to <a href="/freeling/comment/231#comment-231" class="permalink" rel="bookmark" hreflang="en">FreeLing is a library</a> by <span lang="" about="/freeling/user/63" typeof="schema:Person" property="schema:name" datatype="">lluisp</span></p> </footer> <div class="comment__content"> <h3><a href="/freeling/comment/237#comment-237" class="permalink" rel="bookmark" hreflang="en">Thanks Lluis; actually my</a></h3> <div class="clearfix text-formatted field field--name-comment-body field--type-text-long field--label-hidden field__item"><p>Thanks Lluis; actually my question was towards solving this gap through configuration, but I got your point.</p> <p>Regards</p> </div> <drupal-render-placeholder callback="comment.lazy_builders:renderLinks" arguments="0=237&amp;1=default&amp;2=en&amp;3=" token="4DQrX5ZHlszmDKiPDj8WLEK5clWZej_5m4-3W_zEciI"></drupal-render-placeholder> </div> </article> </div><article role="article" data-comment-user-id="63" id="comment-238" class="comment js-comment clearfix"> <span class="hidden" data-comment-timestamp="1480318598"></span> <footer class="comment__meta"> <article typeof="schema:Person" about="/freeling/user/63" class="profile"> </article> <p class="comment__author"><span lang="" about="/freeling/user/63" typeof="schema:Person" property="schema:name" datatype="">lluisp</span></p> <p class="comment__time">Mon, 11/28/2016 - 08:36</p> <p class="comment__permalink"><a href="/freeling/comment/238#comment-238" hreflang="en">Permalink</a></p> </footer> <div class="comment__content"> <h3><a href="/freeling/comment/238#comment-238" class="permalink" rel="bookmark" hreflang="en">it is not configurable</a></h3> <div class="clearfix text-formatted field field--name-comment-body field--type-text-long field--label-hidden field__item"><p>Current configuration options do not allow that.<br /> That is why I told you the alternatives ;-)</p> </div> <drupal-render-placeholder callback="comment.lazy_builders:renderLinks" arguments="0=238&amp;1=default&amp;2=en&amp;3=" token="pG_IegQcO3jlEVP6QVvugE1l64Xk83J-csrDHsmCxZo"></drupal-render-placeholder> </div> </article> </section> Sun, 20 Nov 2016 18:27:32 +0000 flopezbello 601 at https://nlp.lsi.upc.edu/freeling https://nlp.lsi.upc.edu/freeling/node/601#comments incorrect language on Semantic Graph Frame lemma https://nlp.lsi.upc.edu/freeling/node/584 <span property="schema:name" class="field field--name-title field--type-string field--label-hidden">incorrect language on Semantic Graph Frame lemma</span> <span rel="schema:author" class="field field--name-uid field--type-entity-reference field--label-hidden"><span lang="" about="/freeling/user/5408" typeof="schema:Person" property="schema:name" datatype="">carlesg</span></span> <span property="schema:dateCreated" content="2016-07-29T12:37:58+00:00" class="field field--name-created field--type-created field--label-hidden">Fri, 07/29/2016 - 14:37</span> <div class="field field--name-taxonomy-forums field--type-entity-reference field--label-above"> <div class="field__label">Forums</div> <div class="field__item"><a href="/freeling/taxonomy/term/4" hreflang="en">Linguistic Data</a></div> </div> <div property="schema:text" class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><p>Hello,</p> <p>When I use the SemanticGraph to ask for the lemmas of the Frames on the Graph, I get the lemma in English, although I configure Freeling in Spanish, and I ask a question in Spanish.<br /> If I try the Spanish sentence 'Dime el valor del coche.', the tagger says:</p> <p>-------- TAGGER results -----------<br /> Di decir VMM02S0<br /> me me PP1CS00<br /> el el DA0MS0<br /> valor valor NCMS000<br /> de de SP<br /> el el DA0MS0<br /> coche coche NCMS000<br /> . . Fp </p> <p>But the semantic Graph says (look at the lemma on Frame F1):<br /> -------- SEMANTIC GRAPH results -----------<br /> ENTITY W2 : me<br /> ENTITY W3 : valor<br /> FRAME F1 : speak.01|talk.01 : 1 : 1 : 00941990-v<br /> ARG A2:Co-Agent : W2<br /> ARG A1:Topic : W3</p> <p>I'm using a Java class on Windows to invoke freeling via the JNI library.</p> <p>I I use the online demo, it works right and it says the lemma is 'decir.00'.<br /> Which is the problem? Is there any configuration problem?</p> <p>By the way, what does the final number means on the frame lemma? (.00, .01...)</p> </div> <section class="field field--name-comment-forum field--type-comment field--label-hidden comment-wrapper"> <article role="article" data-comment-user-id="63" id="comment-174" class="comment js-comment clearfix"> <span class="hidden" data-comment-timestamp="1469966748"></span> <footer class="comment__meta"> <article typeof="schema:Person" about="/freeling/user/63" class="profile"> </article> <p class="comment__author"><span lang="" about="/freeling/user/63" typeof="schema:Person" property="schema:name" datatype="">lluisp</span></p> <p class="comment__time">Fri, 07/29/2016 - 16:25</p> <p class="comment__permalink"><a href="/freeling/comment/174#comment-174" hreflang="en">Permalink</a></p> </footer> <div class="comment__content"> <h3><a href="/freeling/comment/174#comment-174" class="permalink" rel="bookmark" hreflang="en">that is not the lemma</a></h3> <div class="clearfix text-formatted field field--name-comment-body field--type-text-long field--label-hidden field__item"><p>Those codes are not the lemma, but the semantic code for the verb meaning in propbank (<a href="http://propbank.github.io/">http://propbank.github.io/</a>)</p> <p>E.g for the verb "bear", propbank has two senses, bear.01 and bear.02:<br /> <a href="http://verbs.colorado.edu/propbank/framesets-english-aliases/bear.html">http://verbs.colorado.edu/propbank/framesets-english-aliases/bear.html</a></p> <p>The idea is that you get a language-independent information on the verb frame. So if the sentence was in English instead of Spanish, the verb codes would be the same. Also if the verb was "tell" instead of "say".<br /> In this way, you get a language independent semantic graph that can be usefult to compare texts that express the same meaning using different words, or even in different languages</p> <p>If you get codes such as "decir.00" is because it could not disambiguate properly or the sense was not found in propbank (that is why you get ".00", which is not in propbank)</p> <p>If you want to recover the lemma for a frame in the graph, you need to be aware that the results as presented by analyzer are very simple.<br /> If you use a json or XML output you will have the full graph, with links among components (e.g. in the semantic graph frame you will have the semantic code, but if you want the lemma, the frame will contain the ID for the token that originated the frame, which will contain the lemma).<br /> If you call the library yourself, you can navigate the document data structure to locate the token that originated the frame and find out its lemma.</p> </div> <drupal-render-placeholder callback="comment.lazy_builders:renderLinks" arguments="0=174&amp;1=default&amp;2=en&amp;3=" token="BhEljwqI3Sc0_R5oh__9aKxm47E8yOi2jEiGZZDiafU"></drupal-render-placeholder> </div> </article> <div class="indented"><article role="article" data-comment-user-id="5408" id="comment-176" class="comment js-comment by-node-author clearfix"> <span class="hidden" data-comment-timestamp="1470328864"></span> <footer class="comment__meta"> <article typeof="schema:Person" about="/freeling/user/5408" class="profile"> </article> <p class="comment__author"><span lang="" about="/freeling/user/5408" typeof="schema:Person" property="schema:name" datatype="">carlesg</span></p> <p class="comment__time">Thu, 08/04/2016 - 18:41</p> <p class="comment__permalink"><a href="/freeling/comment/176#comment-176" hreflang="en">Permalink</a></p> <p class="visually-hidden">In reply to <a href="/freeling/comment/174#comment-174" class="permalink" rel="bookmark" hreflang="en">that is not the lemma</a> by <span lang="" about="/freeling/user/63" typeof="schema:Person" property="schema:name" datatype="">lluisp</span></p> </footer> <div class="comment__content"> <h3><a href="/freeling/comment/176#comment-176" class="permalink" rel="bookmark" hreflang="en">Thank you.</a></h3> <div class="clearfix text-formatted field field--name-comment-body field--type-text-long field--label-hidden field__item"><p>Thank you.<br /> So the next question is, why does the demo shows different semantic code for the same word from the one I get in a local execution? Why does the demo can not disambiguate the meaning, but local execution can do that, in the same sentence?</p> <p>And the second question is,<br /> How can I get the lemma from root word in the frame (in this case, 'Dime'-&gt;'decir') traversing the semantic graph? (the last option you say) How can I locate the original Word that originate that token? Which is the matching id?<br /> Is there any way to map the objects in the Semantic graph (doc.getSemanticGraph().getEntities() and doc.getSemanticGraph().getFrames() (root) and doc.getSemanticGraph().getFrames().getArguments()) to the objects in the Trees or Lists (Word, etc.) ?</p> </div> <drupal-render-placeholder callback="comment.lazy_builders:renderLinks" arguments="0=176&amp;1=default&amp;2=en&amp;3=" token="DLfLNaxt9ktUpx3AXaf67IBala6bFmVTM21kvfqbmDQ"></drupal-render-placeholder> </div> </article> </div><article role="article" data-comment-user-id="63" id="comment-183" class="comment js-comment clearfix"> <span class="hidden" data-comment-timestamp="1472037154"></span> <footer class="comment__meta"> <article typeof="schema:Person" about="/freeling/user/63" class="profile"> </article> <p class="comment__author"><span lang="" about="/freeling/user/63" typeof="schema:Person" property="schema:name" datatype="">lluisp</span></p> <p class="comment__time">Wed, 08/24/2016 - 13:12</p> <p class="comment__permalink"><a href="/freeling/comment/183#comment-183" hreflang="en">Permalink</a></p> </footer> <div class="comment__content"> <h3><a href="/freeling/comment/183#comment-183" class="permalink" rel="bookmark" hreflang="en">The demo is probably not</a></h3> <div class="clearfix text-formatted field field--name-comment-body field--type-text-long field--label-hidden field__item"><p>The demo is probably not running the same revision, but an older one, so there may be some differences.</p> <p>For The sentence &quot;Dime el valor del coche.&quot; you get the semantic graph:<br /> &lt;semantic_graph&gt;<br /> &lt;entity id=&quot;W2&quot; lemma=&quot;me&quot;&gt;<br /> &lt;mention id=&quot;t1.2&quot; words=&quot;me&quot;/&gt;<br /> &lt;/entity&gt;<br /> &lt;entity id=&quot;W3&quot; lemma=&quot;valor&quot; sense=&quot;05856388-n&quot;&gt;<br /> &lt;mention id=&quot;t1.4&quot; words=&quot;el valor de el coche&quot;/&gt;<br /> &lt;synonym lemma=&quot;valor&quot;/&gt;<br /> &lt;URI URI=&quot;<a href="http://wordnet-rdf.princeton.edu/wn30/05856388-n&amp;quot">http://wordnet-rdf.princeton.edu/wn30/05856388-n&amp;quot</a>; knowledgeBase=&quot;WordNet&quot;/&gt;<br /> &lt;URI URI=&quot;<a href="http://ontologyportal.org/SUMO.owl#Quantity&amp;quot">http://ontologyportal.org/SUMO.owl#Quantity&amp;quot</a>; knowledgeBase=&quot;SUMO&quot;/&gt;<br /> &lt;/entity&gt;<br /> &lt;frame id=&quot;F1&quot; lemma=&quot;decir.00&quot; sense=&quot;&quot; token=&quot;t1.1&quot;&gt;<br /> &lt;argument entity=&quot;W2&quot; role=&quot;A2&quot;/&gt;<br /> &lt;argument entity=&quot;W3&quot; role=&quot;A1&quot;/&gt;<br /> &lt;/frame&gt;<br /> &lt;/semantic_graph&gt;</p> <p>You can see that the frame &quot;F1&quot; corresponds to token &quot;t1.1&quot;</p> <p>Then, you can navigate the XML tree looking for a &quot;&lt;token&gt;&quot; with id=&quot;t1.1&quot; and you will get<br /> &lt;token ctag=&quot;VMM&quot; form=&quot;Di&quot; id=&quot;t1.1&quot; lemma=&quot;decir&quot; mood=&quot;imperative&quot; num=&quot;singular&quot; person=&quot;2&quot; pos=&quot;verb&quot; tag=&quot;VMM02S0&quot; type=&quot;main&quot;&gt;</p> <p>where you can extract that &quot;lemma&quot; is &quot;decir&quot;</p> <p>If you are not using XML, but directly calling the library, you need to go to token 1 in sentence 1 of the document (that is what t1.1 means)</p> </div> <drupal-render-placeholder callback="comment.lazy_builders:renderLinks" arguments="0=183&amp;1=default&amp;2=en&amp;3=" token="YNjG99hsntx0W88UtWf81gFHECSejqrzjw5Pr2eFaME"></drupal-render-placeholder> </div> </article> </section> Fri, 29 Jul 2016 12:37:58 +0000 carlesg 584 at https://nlp.lsi.upc.edu/freeling https://nlp.lsi.upc.edu/freeling/node/584#comments