Question

我正在研究以下问题：我想使用Stanford CoreNLP将句子分为句子。例句可能是：

"Richard is working with CoreNLP, but does not really understand what he is doing"

我现在希望将句子拆分为单个“ S”，如下图所示：

我希望输出为带有单个“ S”的列表，如下所示：

['Richard is working with CoreNLP', ', but', 'does not really understand what', 'he is doing']

真的很感谢您的帮助：）

Answer 1

我怀疑您要寻找的工具是Tregex，在电源点here或类本身的Javadoc中有更详细的描述。

对于您而言，我相信您正在寻找的模式只是S。因此，类似：

tregex.sh “S” <path_to_file>

其中文件是Penn Treebank格式的树-即类似(ROOT (S (NP (NNS dogs)) (VP (VB chase) (NP (NNS cats)))))。

顺便说一句：我相信片段“ ，但是”实际上并不是一个句子，正如您在图中突出显示的那样。而是，您突出显示的节点包含整个句子“ Richard正在使用CoreNLP，但并不真正了解他在做什么”。然后Tregex将把整个句子打印出来作为匹配项之一。同样，“ 不真正理解”不是一个句子，除非它包含整个SBAR：“ 不理解他在做什么”。

如果只需要“叶子”句子（即一个句子未包含在另一个句子中），则可以尝试以下模式：

S !>> S

注意：我尚未测试这些模式-使用后果自负！

Answer 2

好的，我发现这样做的方法如下：

import requests

url = "http://localhost:9000/tregex"
request_params = {"pattern": "S"}
text = "Pusheen and Smitha walked along the beach."
r = requests.post(url, data=text, params=request_params)
print r.json()

有人知道如何使用其他语言（我需要德语）吗？

使用CoreNLP将句子分割成句子

2 个答案: