Question

在我的数据集中，我有100,000个文本文件，我正在尝试使用CoreNLP处理它们。期望的结果是100,000个完成的文本文件结果，其将每个句子分类为具有正面，负面或中性情绪。要从一个文本文件到另一个文本文件，我使用CoreNLP jar文件，该文件在下面的命令行中使用。

 java -cp "*" -mx5g edu.stanford.nlp.sentiment.SentimentPipeline -fileList list.txt

这需要很长时间才能完成，因为我无法让模型获取文件列表中的每个文件，但它会将单个路径行作为模型的输入。

我还尝试在此链接中实现其他一些方法，但我无法从这些方法中获得结果。 https://stanfordnlp.github.io/CoreNLP/cmdline.html#classpath

有没有更好更快的方法来加快这个过程？

Answer 1

尝试此命令：

import { combineLatest } from 'rxjs/observable/combineLatest';

    const combined = combineLatest(
      this.service1.function1, 
      this.service2.function2, 
      this.service3.function3);

        const subscribe = combined.subscribe(
          ([fn1, fn2, fn3]) => {

            console.log(
            `Fn1: ${fn1},
             Fn2: ${fn2},
             Fn3: ${fn3}`
            );
          }
        );

它将使用更快的shift-reduce解析器。这将遍历java -Xmx14g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,parse,sentiment -parse.model edu/stanford/nlp/models/srparser/englishSR.ser.gz -outputFormat text -filelist list.txt中的每个文件（每行1个文件）并处理它。

加快CoreNLP情绪中的注释时间

1 个答案: