Contributions | FreeLing Home Page

FreeLing is developed and maintained by TALP Research Center at Universitat Politecnica de Catalunya.

Many people has contributed to the project by reporting problems, suggesting improvements, submitting actual code, and extending or creating linguistic databases. Here is a list of these people. Help us keep it complete and exempt of errors.

2018
- Joan Codina from TALN research group at Universitat Pompeu Fabra built wrappers to integrate FreeLing in UIMA. See this post or this web link.
- Eric Kafe from Megadoc updated synset data for Catalan, English, Galician, Spanish, and Portuguese from the MCR project and formatted the files to be used in FreeLing semantic modules.
2017
- Sergi Llamas wrote the Word Embeddings module, and the semantic-based orthographic corrector as his Computer Science degree thesis.
- Guillem Córdoba wrote the Automatic Speech Recognition module as his Computer Science degree thesis.
2016
- Scott Sadowsky assistant professor at Catholic University of Chile provided the dictionary for Chilean Spanish variant.
- Marina Lloberes largely improved the accuray of the rule-based dependency parser for Spanish and Catalan as a part of her PhD research at GRIAL research group in Universitat de Barcelona
- Johannes Heinecke working at France Télécom R&D, programmed date recongition modules for German and French.
- Alexandre Rademaker and his team at IBM Brasil Research Lab, largely improved the portuguese data (morphological dictionary, WordNet, affixation rules, multiwords, ...)
2015
- Agustín Gravano at Universidad de Buenos Aires provided the dictionary for Argentinian Spanish variant.
- Samuel Pedrajas wrote the Summarization module as his Computer Science degree thesis.
2014
- Jaume Jané and Daniel Giribet pointed out that analyzer_client was not suited to process several files without reopening the socket, and suggested a solution.
- Cristina Sánchez-Marco prepared the Norwegian dictionary and other data files, and trained the Norwegian tagger during her post-doc at Gjøvik University College
2013
- Stanilovsky Evgeny made required changes to compile FOMA in windows, compiled the binary packages for MS-Windows, and build the binary package with required third party libraries.
- Ivan Stana provided tokenizer, splitter, multiwords, and phonetic encoding data files for Czech.
- Kelly Davis pointed out crucial weak points in thread safety.
2012
- Stanilovsky Evgeny compiled the binary packages for MS-Windows.
- Cristina Sánchez-Marco extended the Spanish dictionary to cover ancient Spanish (XII-XVI), and trained the tagger models, as part of her PhD thesis at Universitat Pompeu Fabra.
- Kelly Davis extended the list of recognized units for physical measures for English to cover all standard units in the International System.
- Eric Kafe from Megadoc extracted synset data for Catalan, English, Galician, and Spanish from the MCR project and formatted the files to be used in FreeLing semantic modules.
- Stanilovsky Evgeny provided Russian dates, numbers, and quantities modules.
2011
- Pablo Gamallo, Marcos Garcia, and Isaac González from ProLNat@GE Research Group (Universidade de Santiago de Compostela) and Cilenis wrote the Portuguese and Galician dates/numbers/quantities recognition modules.
- Stanilovsky Evgeny provided the Russian dictionary and trained the Russian PoS tagger.
- Vi-Clone funded the development of a spell correction module (to be published soon) that will enable FreeLing to process non-standard texts.
- Stanilovsky Evgeny made the necessary modifications and provided the project files to compile FreeLing 3.0 in MSVC.
- Stanilovsky Evgeny provided the Russian tokenizer and splitter configuration files, and programmed the Russian dates/number/quantities recognition modules.
2010
- Miguel Solla and Xavier Gómez Guinovart, from the Seminario de Lingüística Informática at Universidade de Vigo, developed the Galician accent handling module, improving the affix treatment for this language.
- Israel Olalla, from iSOCO, cross-compiled FreeLing using MinGW, producing binary DLLs for Windows.
- Pablo Gamallo and Marcos Garcia, from Universidade de Santiago de Compostela, improved the Portuguese dictionary, as well as the tokenizer and splitter configuration files.
2009
- Samuel Reese integrated the WordNet-based UKB word sense disambiguator developed by Eneko Agirre and Aitor Soroa at the IXA group at the Basque Country University.
- Daniel Vicente Quílez, working in the Eslema project in Universidad de Oviedo built the morphological dictionary for Asturian.
- Pablo Gamallo and Marcos Garcia, from Universidade de Santiago de Compostela, adapted the (European) Portuguese dictionary from LABEL-LEX, and trained the PoS tagger.
- Pablo Gamallo and Marcos Garcia, from Universidade de Santiago de Compostela, enlarged the Galician dictionary and retrained the Galician PoS taggers with a larger corpus, improving its accuracy.
- Fundació Barcelona Media developed El Corrector, a Catalan grammar and spell checker, enabling us to include their GPL dictionary in FreeLing
- Jordi Carrera, Marina Lloberes, and Irene Castellón from GRIAL research group extended and improved dependency and shallow parsing grammars for Catalan, Spanish and English.
- Francis Tyers, Apertium developer at Prompsit, adapted Eurfa dictionary and trained the tagger, providing morphological and PoS tagging data sets for Welsh.
- Vitalie Scurtu wrote the Italian number recognizer module.
- The Spanish Science and Innovation Ministry funded part of FreeLing development through the KNOW project.
2008
- Miquel Collado developed the coreference resolution module, and trained it for Spanish.
- Daniel Berndt and Gemma Boleda debbuged and extended the English dictionary.
- Daniel Berndt extended the functionalities of the basic NE recognizer.
- Dmitry Vitkovsky largely improved the efficiency of the tokenizer speeding up the whole analysis chain.
- Scurtu Vitalie wrote the Italian number recognizer module.
- The Spanish Science and Innovation Ministry funded part of FreeLing development through the KNOW project.
- The Spanish Industry Ministry funded part of FreeLing development through the EuroOpenTrad project.
2007
- Javier Puche extended the Java API to support access to parse and dependeny trees
- Bruno Martinez made valuable bugfixing and the necessary changes to enable FreeLing compilation on 64-bit architectures.
- The current 76,000-lemma Spanish dictionary was obtained from the Spanish Resource Grammar project developed by Montserrat Marimon at the Institut Universitari de Lingüística Aplicada of the Universitat Pompeu Fabra. These data are included in this package under their original Lesser General Public License For Linguistic Resources (LGPLLR) license (see COPYING file).
- The Spanish Science and Innovation Ministry funded part of FreeLing development through the KNOW project.
- The Spanish Industry Ministry funded part of FreeLing development through the EuroOpenTrad project.
2006
- Jordi Atserias managed to compile FreeLing-1.4 for Windows using cygwin.
- Montserrat Marimon contributed in debugging and improving several parts of the Spanish linguistic data.
- Gorka Labaka, Mikel Lersundi, and Aingeru Mayor from IXA group at Basque Contry University contributed with a lot of testing and suggestions on the Named Entity recognizer, the shallow parser, and the dependency parsing module.
- The Spanish Industry Ministry funded part of FreeLing development through the OpenTrad project.
2005
- Daniel Ferrés largely extended the list of physical magnitudes recognized by the quantities module.
- Mikel Forcada and the InterNostrum team in Universitat d'Alacant completed the Spanish and Catalan dictionaries to cover the same lemas in both languages, enlarging the dictionaries from 5,000 to 6,500 lemmas.
- TALP and CLiC research centers, who developed the Catalan WordNet, granted the distribution of the synsets for the 6,500 most frequent lemmas.
- The feature extraction module is based on the code developed by Dan Roth's Cognitive Computation Group at University of Illinois at Urbana Champaign (UIUC), who we thank for allowing us to distribute our modified version under GPL.
  (Since version 2.0, this module was no longer distributed as part of FreeLing, but separately as the Fries library. Since version 3.0 this module was no longer used by FreeLing).
- TALP and CLiC research centers and Natural Language Processing researchers at UNED (Universidad Nacional de Educacion a Distancia), who developed the Spanish WordNet, granted the distribution of the synsets for the 6,500 most frequent lemmas.
- The English WordNet was developed by the Cognitive Science Laboratory at Princeton University under the direction of Professor George A. Miller. Synset information is included in FreeLing under the original WordNet license terms (see COPYING file).
- The Italian dictionary is extracted from Morph-it! lexicon developed by Marco Baroni and his colleagues at the Scuola Superiore di Lingue Moderne per Interpreti e Traduttori (SSLMIT) of the University of Bologna. These data are included in this package under their original Creative Commons license (See COPYING file).
- The Galician dictionary was obtained from the OpenTrad project, and was developed by Xavier Gómez Guinovart and the members of the Seminario de Lingüística Informática at Universidade de Vigo. These data are included in this package under their original Creative Commons license (See COPYING file).
2004
- Montserrat Civit developed the shallow parsing grammars for Spanish and Catalan.
2003
- Spanish and Catalan linguistic data were originally developed by people in CLiC, Centre de Llenguatge i Computacio at Universitat de Barcelona.