Domain specific NLP

Submitted by flopezbello on Tue, 07/05/2016 - 20:51

Is it possible to train FreeLing in specific domains? More precisely, we are thinking of processing medical (health and genomics) text, which includes specific terminology as genes, fenotypes (or health conditions / traits) and proteins.

Also, it would be very interesting to add (besides specific dictionaries) onthologies that pertain to these domains.

We can assume that English is the base language.

Thanks,
Fernando

FreeLing philosophy is to keep separated code and data. As long as the data are in the right format, you can tune the behaviour of any module simply changing the files it is using.

For terminology, you need to change either the dictionary (in folder data/en/dictionary/entries) or the multiwords file (data/en/locutions.dat)

For ontologies, that depends on what you want to do. In general, you can replace WN synsets with any other codes just changing the file data/en/senses30.dat (or creating a new file and referencing it in the main configuration file).
If you want to use WSD, you'll also need to change files data/common/wn30.dat that provides relations between synsets (or between any codes you use).
If you want to navigate the ontology, the class semantic_db provides some basic methods, but you may be better off using something more adapted to your actual ontology.