Question

根据文档，我可以使用ssplit.isOneSentence等选项将我的文档解析成句子。考虑到StanfordCoreNLP对象，我究竟该怎么做？

这是我的代码 -

Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, depparse");
pipeline.annotate(document);
Annotation document = new Annotation(doc);
pipeline.annotate(document);
List<CoreMap> sentences = document.get(SentencesAnnotation.class);

我在什么时候添加此选项以及在哪里？像这样的东西？

pipeline.ssplit.boundaryTokenRegex = '"'

我也想知道如何将它用于特定选项boundaryTokenRegex

编辑：

我认为这似乎更合适 -

props.put("ssplit.boundaryTokenRegex", "/"");

但我仍需要验证。

Answer 1

这样做的方法是将句子标记为在任何情况下结束。＆＃34; ＆＃39;是这个 -

props.setProperty("ssplit.boundaryMultiTokenRegex", "/\'\'/");

或

props.setProperty("ssplit.boundaryMultiTokenRegex", "/\"/");

取决于它的存储方式。（CoreNLP将其标准化为前者）

如果你想要起始和结束引号 -

props.setProperty("ssplit.boundaryMultiTokenRegex","\/'/'|``\");

为CoreNLP使用ssplit选项

1 个答案: