Question

我目前陷入了这个问题。

NLTK的Chunking功能是这样的：

tokens = nltk.word_tokenize(word)
tagged = nltk.pos_tag(tokens)
chunking = nltk.chunk.ne_chunk(tagged)

有没有什么方法可以在标记之前用标记对标记进行词形变换？喜欢

lmtzr.lemmatize('tokens, pos=tagged)

我试图将这个块进行lemmatize，但它不起作用（错误说明了一些关于chunking是一个列表）。我是python的新手，所以我对它的了解并不是那么好。任何帮助都会很棒！

Answer 1

您可以lemmatize直接不pos_tag -

import nltk
from nltk.corpus import wordnet

lmtzr = nltk.WordNetLemmatizer()
word = "Here are words and cars"
tokens = nltk.word_tokenize(word)
token_lemma = [ lmtzr.lemmatize(token) for token in tokens ]
tagged = nltk.pos_tag(token_lemma)
chunking = nltk.chunk.ne_chunk(tagged)

<强>输出

['Here', 'are', 'word', 'and', 'car'] # lemmatize output
[('Here', 'RB'), ('are', 'VBP'), ('word', 'NN'), ('and', 'CC'), ('car', 'NN')]
(S Here/RB are/VBP word/NN and/CC car/NN)

NLTK - 在被分块之前对令牌进行Lematizing

1 个答案: