我想要做的是从我的8371本书的csv中挑选出最相关的6本书。我确实拿出了前6本书,我想做一个for循环,这样我就可以得到所有8371本书,并在每本书旁边附加6本书的相关清单
def tf_similarity(s1, s2):
def add_space(s):
return ' '.join(list(s))
s1, s2 = add_space(s1), add_space(s2)
cv = CountVectorizer(tokenizer=lambda s: s.split())
corpus = [s1, s2]
vectors = cv.fit_transform(corpus).toarray()
return np.dot(vectors[0], vectors[1]) / (norm(vectors[0]) * norm(vectors[1]))
for j in range(0,1):
for i in range(8371):
titles.append(tf_similarity(str(document[i]), document[j]))
df = pd.DataFrame()
df["document"] = document
df["titles"] = titles
dff = df.sort_values("titles", ascending=False).head(6)
newdff=dff.values.T.tolist()
new=newdff[0]
for k in range(0,2):
boolist.append(new)
我希望列表看起来像这样:
book1 top1 top2 top3 top4 top5 top6
book2 top1 top2 top3 top4 top5 top6