Question

我在this NLP tutorial下运行3. Word to Vectors Integration下的代码块。它使用Spacy的词汇来计算与你给出的任何单词最相似的单词 - 我用它来试图找到给定单词的最近的同义词。但是，如果你用像＆＃34这样的单词替换苹果;看看＆＃34;你会得到很多相关的词，但不是同义词（例如：漂亮，有，等等）。我正在考虑修改代码以通过词性过滤，以便我可以在输出中获得动词并且能够从中获得动词。要做到这一点，我需要使用令牌，因此我可以使用token.pos_，因为该函数不适用于lexemes。有没有人知道一种获取输出的方法（列表称为＆＃34;其他＆＃34;在代码中）并将其从lexeme更改为令牌？我正在阅读关于词汇here的spacy信息文档，但我还没有找到关于转换的任何内容。

我还尝试在最后将一段代码添加到其他人的代码中：

from numpy import dot
from numpy.linalg import norm
import spacy
from spacy.lang.en import English
nlp = English()
parser = spacy.load('en_core_web_md')
my_word = u'calm'
#Generate word vector of the word - apple
apple = parser.vocab[my_word]
#Cosine similarity function
cosine = lambda v1, v2: dot(v1, v2) / (norm(v1) * norm(v2))
others = list({w for w in parser.vocab if w.has_vector and w.orth_.islower() 
and w.lower_ != my_word})
print("done listing")
# sort by similarity score
others.sort(key=lambda w: cosine(w.vector, apple.vector))
others.reverse()
for word in others[:10]:
print(word.orth_)

我添加的部分：

b = ""
for word in others[:10]:

    a = str(word) + ' '
    b += a

doc = nlp(b)
print(doc)
token = doc[0]

counter = 1

while counter < 50:
    token += doc[counter]
    counter += 1

print(token)

这是输出错误：

'token += doc[counter]
TypeError: unsupported operand type(s) for +=: 'spacy.tokens.token.Token' and 'spacy.tokens.token.Token'
<spacy.lexeme.Lexeme object at 0x000002920ABFAA68> <spacy.lexeme.Lexeme object at 0x000002920BD56EE8>  '

有没有人有任何建议来修复我所做的事情或其他方法将lexeme更改为令牌？谢谢！

Answer 1

您必须创建一个Doc来获取令牌，因为令牌不拥有任何数据 - 它只是一个视图。所以，你可以这样做：

doc = Doc(lex.vocab, words=[lex.orth_])

在Spacy中从lexeme转换为token

1 个答案: