我正在尝试使用word2vec创建两个单词之间的相似性,我成功了,同时手动执行。但我有两个大的txt文件。我想创建一个循环。我尝试了几种循环方法,但我没有成功。所以我决定问专家。
我的代码:
import gensim
model = gensim.models.Word2Vec.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True)
with open('myfile1.txt', 'r') as f:
data1 = f.readlines()
with open('myfile2.txt', 'r') as f:
data2 = f.readlines()
data = zip(data1, data2)
with open('myoutput.txt', 'a') as f:
for x in data:
output = model.similarity(x[1], x[0]) # reading each word form each files
out = '{} : {} : {}\n'.format(x[0].strip(), x[1].strip(),output)
f.write(out)
我的input1,(text1)
street
spain
ice
man
我的input2(text2)
florist
paris
cold
kid
我想要这个输出(output.txt)
street florist 0.19991447551502498
spain paris 0.5380033328157873
ice cold 0.40968857572410483
man kid 0.42953233870042506
答案 0 :(得分:0)
import gensim
model = gensim.models.Word2Vec.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True)
file1 = []
file2 = []
with open('myfile1.txt','rU') as f:
for line in f:
file1.append(line.rstrip())
with open('myfile2.txt','rU') as f1:
for line1 in f1:
file2.append(line1.rstrip())
resutl=[]
f=open('Output2.txt', "w")
for i in file1 :
for g in file2 :
temp=[]
temp.append(i)
temp.append(g)
w = model.similarity(i,g)
temp.append(w)
result=i+','+g+','+str(w)
f.write(result)
f.write('\n')
f.close()
你有循环的问题,两个循环应该在一起。