使用Thai模型时,OpenNLP提供错误

时间:2018-07-23 07:15:00

标签: java nlp opennlp thai

我尝试遵循here的建议,但出现此错误:

C:\OpenNLP_models\tool\apache-opennlp-1.5.3-bin\apache-opennlp-1.5.3\bin>opennlp TokenizerME C:\OpenNLP_models\tool\apache-opennlp-1.5.3-bin\apache-opennlp-1.5.3\bin\thai.tok.bin < test.txt

Loading Tokenizer model ... Exception in thread "main" java.lang.NullPointerException
    at opennlp.tools.util.model.BaseModel.getManifestProperty(BaseModel.java:491)
    at opennlp.tools.util.model.BaseModel.initializeFactory(BaseModel.java:245)
    at opennlp.tools.util.model.BaseModel.loadModel(BaseModel.java:237)
    at opennlp.tools.util.model.BaseModel.<init>(BaseModel.java:181)
    at opennlp.tools.tokenize.TokenizerModel.<init>(TokenizerModel.java:125)
    at opennlp.tools.cmdline.tokenizer.TokenizerModelLoader.loadModel(TokenizerModelLoader.java:39)
    at opennlp.tools.cmdline.tokenizer.TokenizerModelLoader.loadModel(TokenizerModelLoader.java:31)
    at opennlp.tools.cmdline.ModelLoader.load(ModelLoader.java:62)
    at opennlp.tools.cmdline.tokenizer.TokenizerMETool.run(TokenizerMETool.java:41)
    at opennlp.tools.cmdline.CLI.main(CLI.java:225)

test.txt文件包含句子“ผมหิวข้าว”。

谁能告诉我如何解决?我想使用POSTagger。 谢谢。

1 个答案:

答案 0 :(得分:0)

我认为您缺少manifest.properties文件。您可以解压缩thai.tok.bin文件并检查其中是否包含以下文件:

  1. token.model(二进制令牌生成器模型)
  2. manifest.properties(配置)

manifest.properties的内容应与您链接到的问题类似:

Manifest-Version=1.0.
Language=th
OpenNLP-Version=1.5.0
Component-Name=TokenizerME
useAlphaNumericOptimization=false