JRE Crashing when using Freeling

Submitted by Ayto on Mon, 08/16/2021 - 13:56
Forums

Hello everyone, hope you're well.

I am using the Freeling library for my Master's Degree thesis on Java, so using the JFreeling API.
Whilst it is working perfectly well on its own, I need to combine it with the ExtJWNL library to extract hyponyms and other semantic relationships.
Since introducing the JWNL library, a SIGSEGV error has been thrown at me every time:

--------------------------------------------------------------------
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f9035e9d308, pid=4378, tid=0x00007f9072ba7700
#
# JRE version: OpenJDK Runtime Environment (8.0_292-b10) (build 1.8.0_292-8u292-b10-0ubuntu1~16.04.1-b10)
# Java VM: OpenJDK 64-Bit Server VM (25.292-b10 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C [libJfreeling.so+0x14d308] std::_Rb_tree, std::less, std::allocator >::_M_begin() const+0xc
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/ayto/IdeaProjects/Lexical/hs_err_pid4378.log
Compiled method (nm) 98787 953 n 0 edu.upc.Jfreeling.JfreelingJNI::Word_getSensesString__SWIG_1 (native)
total in heap [0x00007f905c4f4790,0x00007f905c4f4b20] = 912
relocation [0x00007f905c4f48b8,0x00007f905c4f4900] = 72
main code [0x00007f905c4f4900,0x00007f905c4f4b18] = 536
oops [0x00007f905c4f4b18,0x00007f905c4f4b20] = 8
Compiled method (c1) 98787 862 3 sun.nio.cs.UTF_8$Encoder::encodeLoop (28 bytes)
total in heap [0x00007f905c4a5790,0x00007f905c4a5fb8] = 2088
relocation [0x00007f905c4a58b8,0x00007f905c4a5928] = 112
main code [0x00007f905c4a5940,0x00007f905c4a5dc0] = 1152
stub code [0x00007f905c4a5dc0,0x00007f905c4a5e78] = 184
metadata [0x00007f905c4a5e78,0x00007f905c4a5e90] = 24
scopes data [0x00007f905c4a5e90,0x00007f905c4a5f08] = 120
scopes pcs [0x00007f905c4a5f08,0x00007f905c4a5f98] = 144
dependencies [0x00007f905c4a5f98,0x00007f905c4a5fa0] = 8
nul chk table [0x00007f905c4a5fa0,0x00007f905c4a5fb8] = 24
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
-----------------------------------------------------------------------------------------------

I am using two document databases, A large one (~7200 docs), and a tiny sample(20) for tests. I'm using four semantic fields: hypernym, hyponyms, holonyms and meronyms.
Running the program trying to extract each at a run, on the test database all are working but for the hyponyms. No matter the depth choosen.
On the large main database, none of them is working.

What could be the problem here? I have been scratching my head over it on my own for a month now and haven't made any progress, any help is welcomed, thank you.

INFOS: Using Ubuntu and coding using Netbeans (tried changing IDE moving to IntelliJ too).
Here is the Java Code:
https://justpaste.it/2k0kb

I am not familiar with ExtJWNL, so I am not sure of what may be the problem.

Also, it is not clear how are you comibing ExtJWNL and FreeLing.  It may be something as simple as a missing synset in FL sense database.

You should try to narrow down the issue:  Does it happen with a specific word or text?  which?  I see that the problem is with getSensesString... What parameters make it crash? 

Apologies for the very late reply, I hope it's not too late to get help though.
Unfortunately I have been unable to narrow down the error. I have tried pinning down a possibly problematic file, or even word, but it's not it. The error doesn't always happen on the same file, in fact it may happen with previously working files, nor does it happen with specific words. Sometimes the error occurs just before FL start treating a new text.

What I have been able to find out is that on the 20-documents mini corpus I use for testing purposes, I am able to use FL to extract synsets (as is the case with the main corpus) as well as extract their hypernyms (crash on the main corpus), but not they hyponyms, unless the depth is set at 1. Meaning I can only extract the direct hyponyms of each word present in the mini corpus.

To give more precision as to how I proceed: I use a SimpleFileVisitor to walk through the files that are present in different folders (the folders are the labels for my classification). First I treat each text one by one with FL: Tokenization, disambiguation etc. I also capture each word and extract its synset. The goal is to write a Sparse Matrix for my SVM algorithm to train/test.
So basically I end up with a file written like this:
LABEL n:m n:m n:m
LABEL n:m n:m n:m (...)

With Label, the name of the folder containing the file, n the synset of the word, m the number of its occurences.

When I was done with this step, I introduced JWNL to extract hypernyms, hyponyms, meronyms etc. and see if it improves the Classification.
To do so, I simply made it so that in the middle of a FL analysis, I would take the currently processed word and use JWNL to get its hypernyms from its synset, up to a certain depth. I would then add those hypernyms' synsets to the Matrix.

But I have struggled with this process ever since without finding a solution.
Here's a copy of a bug report written by the JVM, I am seeing that the problem indeed occurs with getSensesString and during the WalkFileTree process, but I haven't been able to understand what the issue is exactly.
https://jpst.it/2CkKz

Thousand thanks for your time.

I do not have enough information to help you... You should activate the debug options in freeling to see what is going on. (build freeling with cmake option -DTRACES=ON, and then set TraceLevel and TraceModule variables in your program. See user manual for details

However, it may be easier to split your job in two stages:  First you run a simpler program that uses freeling to produce your file  with "LABEL n:m n:m n:m" etc

Then you write a separate program that reads this file and uses JWNL to look for hipernyms, hiponyms or whatnot.

In this way, you get rid of any possible interaction between FL and JWNL (which seems to be what is causing problems), and having them separated it may be easier to spot what the problem (is if it persists)

I also, somehow, forgot to mention but during my tests, for some weird reasons the last word just before a crash would be printed out like this:
"CURRENT WORD: 챀翧emmed"
For some reason, no matter the word, Some letters are swapped with Kanji characters. My documents are all in English only.