如何在Stanford CoreNLP服务器上使用自定义TokensRegex规则注释器?

时间:2016-11-05 18:52:41

标签: stanford-nlp stanford-nlp-server corenlp-server

当通过命令行使用CoreNLP时,TokensRegex规则颜色注释器(stanford-corenlp-full-2016-10-31/tokensregex/color.rules.txt)成功加载,但对于java.lang.IllegalArgumentException: Unknown annotator: color的Web服务器则失败。

设置

# custom.properties
annotators=tokenize,ssplit,pos,lemma,ner,regexner,color
customAnnotatorClass.color = edu.stanford.nlp.pipeline.TokensRegexAnnotator
color.rules = tokensregex/color.rules.txt

命令行

$ java -cp "*" -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP -props custom.properties -file ./tokensregex/color.input.txt -outputFormat text
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Registering annotator color with class edu.stanford.nlp.pipeline.TokensRegexAnnotator
...
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator color
[main] INFO edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor - Reading TokensRegex rules from tokensregex/color.rules.txt
[main] INFO edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor - Read 7 rules

# color.input.txt.output
Sentence #1 (9 tokens):
Both blue and light blue are nice colors.
[Text=Both CharacterOffsetBegin=0 CharacterOffsetEnd=4 PartOfSpeech=CC Lemma=both NamedEntityTag=O]
[Text=blue CharacterOffsetBegin=5 CharacterOffsetEnd=9 PartOfSpeech=JJ Lemma=blue NamedEntityTag=COLOR NormalizedNamedEntityTag=#0000FF]
...

服务器

  1. java -mx2g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -c custom.properties
  2. wget --post-data 'Both blue and light blue are nice colors.' 'localhost:9000/?properties={"annotators":"tokenize,ssplit,pos,lemma,ner,regexner,color","outputFormat":"json"}' -O -

    HTTP request sent, awaiting response... 500 Internal Server Error
        2016-11-05 14:41:27 ERROR 500: Internal Server Error.
    
    java.lang.IllegalArgumentException: Unknown annotator: color
        at edu.stanford.nlp.pipeline.StanfordCoreNLP.ensurePrerequisiteAnnotators(StanfordCoreNLP.java:304)
        at edu.stanford.nlp.pipeline.StanfordCoreNLPServer$CoreNLPHandler.getProperties(StanfordCoreNLPServer.java:713)
        at edu.stanford.nlp.pipeline.StanfordCoreNLPServer$CoreNLPHandler.handle(StanfordCoreNLPServer.java:540)
        at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
        at sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:83)
        at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:82)
        at sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:675)
        at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
        at sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:647)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
    
  3. 解决方案

    在请求中包含自定义注释器属性:wget --post-data 'Both blue and light blue are nice colors.' 'localhost:9000/?properties={"color.rules":"tokensregex/color.rules.txt","customAnnotatorClass.color":"edu.stanford.nlp.pipeline.TokensRegexAnnotator","annotators":"tokenize,ssplit,pos,lemma,ner,regexner,color","enforceRequirements":"false","outputFormat":"json"}' -O -

1 个答案:

答案 0 :(得分:4)

添加

"enforceRequirements":"false"

根据您的要求,应该停止此错误!