Question

我最近在使用斯坦福Lexparser。不幸的是，我遇到了一个问题，因为它需要很长时间，特别是当我传入一个大文件时。多线程有助于提高性能吗？我知道可以在命令行中轻松完成多线程。但是，我想在内部使用API多线程。目前，我正在使用此代码。我怎么做多线程？

for (List<HasWord> sentence : new DocumentPreprocessor(fileReader)) {
        parse = lp.apply(sentence);
        TreePrint tp = new TreePrint("typedDependenciesCollapsed");
        tp.printTree(parse, pw);
}

Answer 1

您可以使用常规的旧Java线程并行注释文档。例如：

Properties props = new Properties();
props.setProperty("annotators", "tokenize,ssplit,parse");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

Annotation ann = new Annotation("your sentence here");
for (int i = 0; i < 100; ++i) {
  new Thread() {
    @Override public void run() {
      pipeline.annotate(ann);  // except, you should probably annotate different documents.
      Tree tree = ann.get(SentencesAnnotation.class).get(0).get(TreeAnnotation.class);
    }
  }.start();
}

另一种选择是使用simple API：

for (int i = 0; i < 100; ++i) {
  new Thread() {
    @Override public void run() {
      Tree tree = new Sentence("your sentence").parse();
    }
  }.start();
}

但是，在很高的水平上，你不可能从多线程中获得非常大的加速。解析通常很慢（句子长度为O（n ^ 3））并且多线程仅为您提供核心数量的最大线性加速。提高速度的另一种方法是使用the shift reduce parser，或者，如果您对依赖而不是选区解析，则Stanford Neural Dependency Parser。

斯坦福LexParser多线程

1 个答案: