Example 07: Extracting Triples with Semantic Information

Submitted by kashyap on Mon, 07/09/2018 - 14:51

As a part of my current task, I am trying to feed(/extracted) knowledge(/triples) from Freeling to the Ontology. I did go through Freeling tutorials and I did try to run the example 7(https://talp-upc.gitbooks.io/freeling-tutorial/content/code/example07.p…) but when I run the programme it does gives below error:

FREELINGDIR environment variable not defined, trying /usr/local
Text language is: en
Traceback (most recent call last):
File "pyfreeling_triple_extraction.py", line 172, in
ProcessSentences(ls, sdb)
File "pyfreeling_triple_extraction.py", line 46, in ProcessSentences
if lsubj!="" and ldobj!="" :
UnboundLocalError: local variable 'lsubj' referenced before assignment

Even if I comment those lines and only try to print get_predicates() it gives "Text language is: en" as output nothing else.

Here, goes my code:

#! /usr/bin/python

import pyfreeling
import sys, os

## Extract lemma and sense of word 'w' and store them
## in 'lem' and 'sens' respectively
def extract_lemma_and_sense(w) :
lem = w.get_lemma()
if len(w.get_senses())>0 :
sens = w.get_senses()[0][0]
return lem, sens

## -----------------------------------------------
## Do whatever is needed with analyzed sentences
## -----------------------------------------------
def ProcessSentences(ls, sdb) :

# for each sentence in list
for s in ls :

# for each predicate in sentence
for pred in s.get_predicates() :
lsubj=""; ssubj=""; ldobj=""; sdobj=""
# for each argument of the predicate
for arg in pred :
# if the argument is A1, store lemma and synset in ldobj, sdobj
if arg.get_role()=="A1" :
(ldobj,sdobj) = extract_lemma_and_sense(s[arg.get_position()])
# if the argument is A0, store lemma and synset in lsubj, subj
elif arg.get_role()=="A0" :
(lsubj,ssubj) = extract_lemma_and_sense(s[arg.get_position()])
# Get tree node corresponding to the word marked as argument head
head = s.get_dep_tree().get_node_by_pos(arg.get_position())
# check if the node has dependency is "by" in passive structure
if lsubj=="by" and head.get_label=="LGS" :
# get first (and only) child, and use it as actual subject
head = head.get_nth_child(0)
(lsubj,ssubj) = extract_lemma_and_sense(head.get_word())

#if the predicate had both A0 and A1, we found a complete SVO triple. Let's output it.
if lsubj!="" and ldobj!="" :
(lpred,spred) = extract_lemma_and_sense(s[pred.get_position()])
# if we found a synset for the predicate, obtain lemma synonyms and SUMO link
if (spred!="") :
ipred = sdb.get_sense_info(spred);
lpred = "/".join(ipred.words) + " [" + ipred.sumo + "]"
# if we found a synset for the subject, obtain lemma synonyms and SUMO link
if (ssubj!="") :
isubj = sdb.get_sense_info(ssubj);
lsubj = "/".join(isubj.words) + " [" + isubj.sumo + "]"

# if we found a synset for the object, obtain lemma synonyms and SUMO link
if (sdobj!="") :
idobj = sdb.get_sense_info(sdobj);
ldobj = "/".join(idobj.words) + " [" + idobj.sumo + "]"

print ("SVO : (pred: " , lpred, "[" + spred + "]")
print (" subject:" , lsubj, "[" + ssubj + "]")
print (" dobject:" , ldobj, "[" + sdobj + "]")
print (" )")

## -----------------------------------------------
## Set desired options for morphological analyzer
## -----------------------------------------------
def my_maco_options(lang) :

lpath = DATA + LANG + "/"

# create options holder
opt = pyfreeling.maco_options(lang);

# Provide files for morphological submodules. Note that it is not
# necessary to set file for modules that will not be used.
opt.UserMapFile = "";
opt.LocutionsFile = lpath + "locucions.dat";
opt.AffixFile = lpath + "afixos.dat";
opt.ProbabilityFile = lpath + "probabilitats.dat";
opt.DictionaryFile = lpath + "dicc.src";
opt.NPdataFile = lpath + "np.dat";
opt.PunctuationFile = lpath + "../common/punct.dat";
return opt;

## ----------------------------------------------
## ------------- MAIN PROGRAM ---------------
## ----------------------------------------------

## Check whether we know where to find FreeLing data files
if "FREELINGDIR" not in os.environ :
if sys.platform == "win32" or sys.platform == "win64" : os.environ["FREELINGDIR"] = "C:\\Program Files"
else : os.environ["FREELINGDIR"] = "/usr/local"
print >> sys.stderr, "FREELINGDIR environment variable not defined, trying ", os.environ["FREELINGDIR"]

if not os.path.exists(os.environ["FREELINGDIR"]+"/share/freeling") :
print >> sys.stderr, "Folder",os.environ["FREELINGDIR"]+"/share/freeling", "not found.\nPlease set FREELINGDIR environment variable to FreeLing installation directory"

# Location of FreeLing configuration files.
DATA = os.environ["FREELINGDIR"]+"/share/freeling/";

# Init locales

# create language detector. Used just to show it. Results are printed
# but ignored (after, it is assumed language is LANG)

# create options set for maco analyzer. Default values are Ok, except for data files.

op= pyfreeling.maco_options(LANG);
op.set_data_files( "",
DATA + "common/punct.dat",
DATA + LANG + "/dicc.src",
DATA + LANG + "/afixos.dat",
DATA + LANG + "/locucions.dat",
DATA + LANG + "/np.dat",
DATA + LANG + "/quantities.dat",
DATA + LANG + "/probabilitats.dat");

# create analyzers

# activate mmorpho odules to be used in next call
mf.set_active_options(False, True, True, True, # select which among created
True, True, False, True, # submodules are to be used.
True, True, True, True ); # default: all created submodules are used

# create tagger, sense anotator, and parsers
parser= pyfreeling.chart_parser(DATA+LANG+"/chunker/grammar-chunk.dat");
dep=pyfreeling.dep_txala(DATA+LANG+"/dep_txala/dependences.dat", parser.get_start_symbol());

# create semantic DB module
sdb = pyfreeling.semanticDB(DATA+LANG+"/semdb.dat");

# process input text

print "Text language is: "+la.identify_language(lin)

while (lin) :

l = tk.tokenize(lin);
ls = sp.split(sid,l,False);

ls = mf.analyze(ls);
ls = tg.analyze(ls);
ls = sen.analyze(ls);
ls = parser.analyze(ls);
ls = dep.analyze(ls);


# do whatever is needed with processed sentences
ProcessSentences(ls, sdb)

# clean up

Of course it doesn't. The program extracts SVO triples, which means it looks for sentences with A0 *and* A1.  If the sentence has only A0, there is no direct object, thus no triple is extracted.

However, this program is just an example. You should adapt the code to your needs to output whatever you want.

get_predicates will only return something if you called the SRL

If you are using version 4.1, that task is performed by dep_treeler module (and not for all languages, so check that your target language has this feature available).

If you are using master version, this has changed, and SRL is a separate module that needs to be called explicitily.

Check details for SRL module in 'master' version of the manual.