我有两个列表
我使用了词向量和余弦相似度,基于两个向量之间的余弦值来寻找相似词。
我已经定义了矢量功能和余弦相似度的单词,所以我在这里没有提及。
tar1 = ['apple','fruit', 'vegetable','school']
tar2 = ['fruit', 'apple', 'school','vegetable']
i=0
j=0
for i in range (len(tar1)):
vect1 = text_to_vector(tar1[i].strip().lower())
for j in range(len(keyword)):
vect2 = text_to_vector(tar2[j].strip().lower())
cosine = get_cosine(vect1, vect2)
j = j+1
i = i+1
在嵌套循环中,我想在内循环运行后挑选出具有最大余弦相似度值的字符串。
例如: tar1中的第一项是“苹果” tar2中'apple'的余弦相似度很高。因此基于高余弦相似度。它必须选择单词
我正在寻找如下输出。
o / p = ['苹果','水果','蔬菜','学校']
答案 0 :(得分:0)
可能的实现方式来获得您想要的(带有注释):
def text_to_vector(text):
return text
def get_cosine(x, y):
return 1 if x == y else 0
tar1 = ['apple', 'fruit', 'vegetable', 'school']
tar2 = ['fruit', 'apple', 'school', 'vegetable']
result = list()
# iterate over words in tar1
for dummy_idx_1, vector_1 in enumerate(text_to_vector(word) for word in tar1):
# keep track of the maximum cosine and most similar word
max_cosine, best_word = -1, None
# iterate over words in tar2 for every word in tar1
for idx_2, vector_2 in enumerate(text_to_vector(word) for word in tar2):
# compute cosine
cosine = get_cosine(vector_1, vector_2)
# check if current word from tar2 is the most similar to the word from tar1
if cosine > max_cosine:
max_cosine, best_word = cosine, tar2[idx_2]
# remember result for every word from tar1
result.append(best_word)
print(result)
输出为:
['apple', 'fruit', 'vegetable', 'school']