Q值。找到pos-tag频率/句子。原谅我,我是4个月前开始的python新手。我能够弄清楚如何将pos-tags应用于文档中的单词。
train_text = SOME TEXT 1
sample_text = SOME TEXT 2
custom_sent_tokenizer = PunktSentenceTokenizer(train_text)
tokenized = custom_sent_tokenizer.tokenize(sample_text)
def process_content():
try:
for i in tokenized:
words = nltk.word_tokenize(i)
tagged = nltk.pos_tag(words)
print(tagged)
except Exception as e:
print(str(e))
process_content()
现在这里我不知道要继续。此时我对计算每个句子的某些POS_TAG的频率感兴趣。然后当我这样做时,我想绘制与我识别的POS-TAG数量相关的句子长度(单词数)。当我尝试这样做时,我只能在整个文档中找到与文档中所有单词相关的后标记的频率。即使我已经进行了标记,我仍然会在分析时获取整个文档。帮助这是我的疯狂!