Question

使用NLTK，我想写下一个标签模式来处理类似于动名词和/或协调名词的名词短语。导入基本库后，我将候选文本标记为如下：

sentences=nltk.word_tokenize('......')

它包含几个句子。然后我用它标记：

sentences=nltk.pos_tag(sentences)

我还将我提出的语法定义为：

grammar= r"""
Gerunds: {<DT>?<NN>?<VBG><NN>}
Coordinated noun: {<NNP><CC><NNP>|<DT><PRP\$><NNS><CC>
<NNS>|<NN><NNS> <CC><NNS>} """

然后，我雇用：

cp=nltk.RegexpParser(grammar);
for sent in sentences:
   tree = cp.parse(sent)
   for subtree in tree.subtrees():
     if subtree.label()=='Gerunds': print(subtree)
print(cp.parse(sentences));

它说ValueError: chunk structures must contain tagged tokens or trees

我应该如何处理问题？

Answer 1

我做了：

from nltk import word_tokenize, pos_tag

然后，我使用了

，而不是使用tree = cp.parse(sent)和print(cp.parse(sentences))

tree = cp.parse(pos_tag(word_tokenize(sentences)))

和

print(cp.parse(pos_tag(word_tokenize(sentences))))

它就像一个魅力！： - ）

Python NLTK Chunking

1 个答案: