bash vs. java socket client

Submitted by flopezbello on Sat, 04/08/2017 - 16:41
Forums

Hello Lluis. I'm working on an integration between KNIME and Freeling. I believe this one can be a powerful combination, and look forward to make them talk as smoothly as possible.

As such, I've been trying a couple of approaches on a first stage which I would like to comment. I will really appreciate your feedback:

1- use Freeling Analyze server (by calling the Analyze Client from KNIME)
This works fine, but I've notice significant performance differences when the analysis output level is set to "coref" (as opposed to "dep"), it's a 100x timing difference. Why that much?

It is good option as an inicial approach though.

2- again, use Analyze server, but from a Java socket client
I've got your code lines from GitHub (FreelingSocketClient.java). IMHO, there's a tiny detail that can be improved there, regarding zero-terminated-strings:

Code could be changed from:
if(sb.toString().compareTo(SERVER_READY_MSG)!=0)
to
if(sb.toString().replaceAll("\0", "").compareTo(SERVER_READY_MSG)!=0)
in order to avoid unuseful warning messages.

That said, which of these two approaches would be more stable in terms of scalability?

Cheers,
Fernando

Analyzer server and client are intended only as a proof of concept to show how freeling can be used in such environment. They are not production-level software: The protocol is ad-hoc and there are no failsafes. If you use it, do it at your own risk.

The right way to use FreeLing in a client-server application is to write your own server using a real framework (e.g. mongoose, Wt, pistache, cppCMS, ...) to create a REST server, and then call it using standard REST calls.

Regarding the speed difference:
If you use "--outlv dep", the default analyzer uses a rule-based dependency parser, which is fast, though less accurate.
If you use "--outlv coref", a quadratic statistical parser and SRL is used (slower, more accurate), plus a NE classifier (that requires some time consuming feature extraction), plus the coreference algorithm itself (also quadratic with also time consuming feature extraction).
We are working to speed up the dep+SRL statistical parser, as well as the coreference solver, but that will take quite some time.

Finally, note that although KNIME is GPL, FreeLing is Affero-GPL, so the integrated version of FreeLing should be distributed under the later.

Thanks Lluis, I see your point. This is for academic research, so for the time being, I'd rather concentrate in leveraging Freeling functionalities, as opposed to building a REST server right now. That's why I found your code in GitHub very useful.

Cheers,
Fernando