在Uima Pipeline中使用TreeTagger时找不到Charsetname

时间:2018-07-19 14:37:01

标签: uima treetagger text-chunking dkpro-core

我想使用TreeTagger在uima管道内对德语文本进行分块。当我使用cmd启动Tagger时,分块工作正常,但是在管道中使用时会导致以下错误:

    org.apache.uima.analysis_engine.AnalysisEngineProcessException: Annotator processing failed.    
    at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:401)
    at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:308)
    at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:570)
    at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.<init>(ASB_impl.java:412)
    at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:344)
    at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:265)
    at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:269)
    at org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:150)
    at de.fraunhofer.fkie.re_analysis.RA_pipeline.main(RA_pipeline.java:107)
Caused by: java.lang.NullPointerException: charsetName
    at java.io.InputStreamReader.<init>(InputStreamReader.java:99)
    at org.annolab.tt4j.TreeTaggerWrapper$Reader.<init>(TreeTaggerWrapper.java:946)
    at org.annolab.tt4j.TreeTaggerWrapper.process(TreeTaggerWrapper.java:598)
    at de.tudarmstadt.ukp.dkpro.core.treetagger.TreeTaggerChunker.process(TreeTaggerChunker.java:293)
    at org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)
    at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:385)
    ... 8 more

我想我应该指定参数“ Chunk_Mapping_Location”,但是我不知道要哪个文件。分块器通过以下方式初始化:

                AnalysisEngineDescription chunker =
                    AnalysisEngineFactory.createEngineDescription(
                                TreeTaggerChunker.class,
                                TreeTaggerChunker.PARAM_EXECUTABLE_PATH, "C:/TreeTagger/bin/tree-tagger.exe",
                                TreeTaggerChunker.PARAM_MODEL_LOCATION, "C:/TreeTagger/lib/german-chunker-utf8.par",
                                TreeTaggerChunker.PARAM_PERFORMANCE_MODE, true,
                                TreeTaggerChunker.PARAM_PRINT_TAGSET, true,
                                TreeTaggerChunker.PARAM_LANGUAGE, "de"
                            );

1 个答案:

答案 0 :(得分:0)

看起来像TreeTaggerChunking丢失了PARAM_MODEL_ENCODING,这阻止了它在直接指定的模型中使用。我已经打开了issue

您可以通过使用DKPro Core附带的build.xml Ant脚本将TreeTagger模型打包为JAR来解决此问题。 DKPro Core developer documentation中描述了该过程。

披露:我是DKPro Core开发人员之一。