I'm trying to train a NER model using the train-nerc directory and the demo data that exist in train-nerc/corpus
I'm following the scripts, and encountered several problems:
1. corpus/bin/extract-gaz.sh (is being called from prepare-corpus) - is stuck inside the second loop, looks like the achieved ratio doesn't achieve the goal ratio - i suspect that it is due to the small volume of the demo set. Should it work fine and something else is wrong on my side?
2. ner/bin/encode-corpus.sh - in the readme it is written that there are only 2 arguments needed, language and features. But in the script it needs also gaz-type (which is written in the README as needed for NEC and not NER). without a gaz-type the gaz features are getting a path that doesn't exist because of "sed "s/\(gaz.*-[cp].dat\)/\1.$gz/"- gz missing means that the filename will have "." after the extension. And even without it, the files exist in en/nerc/data after using extract-gz doens't match the names in the features files. i.e. in the features it looks for the files gazPER-p.dat, but it does not exist and i have gazPER-p.dat.rich.train?
so, does it means that the gaz-type is actually a must and what is written in the README is not up to date?
actually i have more questions, but it keeps thinking i'm spamming so i'll post it along the way...