Autodetect language

Submitted by xdavid on Fri, 11/11/2016 - 09:37

Hello, I'm procesing text in server with "analyzer_client my.server.com:50005 myoutput" only with spanish text for the moment, but I should do it for all languages and analyze if someone speaks in English, French, Spanish or someother. I see in the documentation "lang_ident" and "lang "", but i don't know use that. Can you provide me any help for do it?

Thank you.

"analyzer" (both in standalone or client-server mode) is just a demo program to show main FreeLing functionalities, but it does not provide all possible combinations.

You can get much more from FreeLing if you write your own main programs that call the modules you need, in the order you need.

This said, You could solve your problem in two ways:

  • Have a look to example programs in src/main/simple_examples (e.g. sample.cc) and build your own main program that identifies the language of the input and sends it to the right processors.
    Your program should create an instance of analyzers for every target language, and send the text to the right instance depending on the output of the language identification module.
    Since you will have control of when modules are created and called, this approach may free you of the need of using analyzer server mode
  • If you want a client-server solution, can launch one "analyzer" server configured to provide language identification (use "--outlv ident" option). Then launch one "analyzer" server for EACH possible language your input may arrive in.
    Finally, modify the analyzer_client code to send the text to the language identification server, and depending on the answer, send it to the right analyzer for that language

Finally one last warning: the client-server mode is devised as a demo.
It is not reliable to be used in production mode. Use it at your own risk.