我要弄清楚给定的陈述是问题还是正常陈述而没有定义任何大块语法。我尝试绘制一棵需要语法的树,但它没有告诉我这是一个问题还是一个陈述。 Penn Treebank是我听说过的一种解决方案,但找不到任何帮助
train_text = state_union.raw("text1.txt")
sample_text = state_union.raw("text2.txt")
custom_sent_tokenizer = PunktSentenceTokenizer(train_text)
#PunktSentenceTokenizer is an abtract class for sent_tokenizer()
tokenized = custom_sent_tokenizer.tokenize(sample_text)
##print (custom_sent_tokenizer)
print (tokenized)
try:
for i in tokenized:
words = nltk.word_tokenize(i)
tagged = nltk.pos_tag(words)
print tagged
chunkGram = r"""Chunk: {<RB.?>*<VB.?>*<NNP>+<NN>?}"""
chunkParser = nltk.RegexpParser(chunkGram)
chunked = chunkParser.parse(tagged)
print chunked
chunked.draw()
except Exception as e:
print(str(e))