当主题和lines_data的长度分别为100和1.5M时。我怎样才能提高它的速度。这需要太多时间。
我的代码如下:
for j, top in enumerate(topics):
del write[:]
del ranked[:]
file.write("\n")
for i, line in enumerate(lines_data):
string = line
word = string[:18]
tostr = string[20:]
vector = np.fromstring(tostr[:-2], dtype=float, sep=',')
while True:
try:
cos = cosine_similarity(top[1].reshape(1, -1), vector.reshape(1, -1))
cos_list = cos.reshape(1).tolist()
if (i <= 50):
ranked += [(top[0], cos_list[0], word)]
ranked = sorted(ranked, key=lambda tup: tup[1], reverse=True)
elif (i > 50 and ranked[-1] < cos_list[0]):
del (ranked[-1])
ranked += [(top[0], cos_list[0], word)]
ranked = sorted(ranked, key=lambda tup: tup[1], reverse=True)
break
except:
raise
for rank in ranked[:50]:
write.append(rank[0] + " " + str(rank[1]) +" " + rank[2])
file.write("\n".join(write))
答案 0 :(得分:0)
尝试重写:
for rank in ranked[:50]:
write.append(rank[0] + " " + str(rank[1]) +" " + rank[2])
分为:
for rank in ranked[:50]:
elements = [rank[0],str(rank[1]),rank[2]]
write.append(" ".join(elements))
字符串连接非常慢,因此连接应该是加速。