使用Spacy进行Lematizing

时间:2017-08-07 13:13:26

标签: lemmatization

我有一个包含句子的列表。

list = ["I'm hoping to go jogging", "I haven't eaten in a while","where is everybody going"]

我想将上面的列表变为lematize并用引理代替原始单词。

如何使用spacy进行操作?

我知道我可以在一个循环中打印引理,但我想要的是用lemmatized替换原始单词。

1 个答案:

答案 0 :(得分:1)

这听起来像你在寻找:

import spacy
from spacy.en import English
parser = English()

list = ["I'm hoping to go jogging", "I haven't eaten in a while","where is everybody going", 
    "Hello, how are you? I'm doing good."]
lemmatized_list = []

for sentence in list:
    tokens = parser(sentence)
    lemmas = []
    for tok in tokens:
        if not tok.is_punct:
            lemmas.append(tok.lemma_.lower().strip() if tok.lemma_ != "-PRON-" else tok.lower_)
    lemmatized_phrase = ""
    for l in lemmas:
        lemmatized_phrase += l + " "
    lemmatized_phrase = lemmatized_phrase[:-1]
    lemmatized_list.append(lemmatized_phrase)
print (lemmatized_list)

>>> ['i be hop to go jogging', "i haven't eat in a while", 'where be everybody go', 'hello how be you i be do good']