AttributeError:“列表”对象没有属性“相似性”

时间:2019-05-05 14:41:27

标签: python python-3.x spacy

ws = {}
nlp = spacy.load('de_core_news_sm')
data = 'Some long text'
train_corpus = nlp(data)
train_corpus = [token.text for token in train_corpus if not token.is_stop and len(token) > 4]
test_corpus = nlp('Some short sentence')   
ae = train_corpus.similarity(test_corpus)

我在AttributeError: 'list' object has no attribute 'similarity'得到ae = train_corpus.similarity(test_corpus)。如果我删除train_corpus = [token.text for token in train_corpus if not token.is_stop and len(token) > 4],它会起作用,但带有停用词。

如何删除停用词以使其仍然有效?

编辑:ae = nlp(train_corpus).similarity(test_corpus)指向TypeError: Argument 'string' has incorrect type (expected str, got list)

1 个答案:

答案 0 :(得分:0)

请注意,您正在对德语短语使用德语模型。在您的情况下,您需要重新粘合剩余的令牌并再次创建“ spacy对象”。在您的情况下,无论如何都应通过len(token)> 4条件删除所有令牌。

import spacy

nlp = spacy.load('en_core_web_sm')
# nlp = spacy.load('de_core_news_sm')
ws = {}
#data = 'Some long text'
data = 'Some long text Elephant'
train_corpus = nlp(data)
train_corpus = nlp(" ".join([token.text for token in train_corpus if not token.is_stop and len(token) > 4]))
test_corpus = nlp('Some short sentence')
ae = train_corpus.similarity(test_corpus)

print(ae)