Question

我正在 Windows 中进行NLP项目，问题是每当我从命令提示符运行Stanford CoreNLP时，生成给定输入文本文件的XML输出大约需要14-15秒。我认为这个问题是因为库需要花费很多时间才能加载。可以请有人解释问题是什么，如何解决这个问题，因为这个时间问题对我的项目来说是个大问题？

Answer 1

Stanford CoreNLP使用各种组件的大型参数模型文件。是的，他们需要很多时间来加载。你想要做的只是启动程序一次，然后给它提供大量的文本。

你如何做到这一点取决于你在做什么：

您可以将-filelist传递给命令行版本，以便一次处理大量文件。
您可以让一个StanfordCoreNLP对象运行，并将文件发送给它并使用API获取输出。
根据您需要的NLP处理，您还可以通过不加载未使用的模型来加快启动速度。请参阅“注释器”属性。

2016年更新：文档页面Understanding memory and time usage

现在有更多相关信息

Answer 2

克里斯托弗是对的，这是其中一个解决方案：

import java.util.Properties;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;

public class SentimentAnalyzer {
    private StanfordCoreNLP pipeline;

    public void initializeCoreNLP() { 
        Properties props = new Properties();
        props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, sentiment");
        pipeline = new StanfordCoreNLP(props);
    }

    public T getSentiment(String text) {
        ...
        Annotation annotation= new Annotation(text);
        pipeline.annotate(annotation);
        ...
        return ...
    }

    public static void main(String[] argv) {
        SentimentAnalyzer sentimentAnalyzer = new SentimentAnalyzer();
        sentimentAnalyzer.initializeCoreNLP(); // run this only once
        T t = sentimentAnalyzer.getSentiment("put text here..."); // run this multiple times
    }
}

Answer 3

要了解如何使用API，请检查下载的Core NLP文件夹中的示例代码“NERDemo.java”。

斯坦福CoreNLP非常慢

3 个答案: