我正在使用基于内容的过滤方法构建(文章的)推荐系统。为此,我使用余弦相似度。
我尝试了一个脚本(来自教程),该脚本基本上根据给定/已读的文章返回“ n”篇文章
这是我尝试过的脚本
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
ds = pd.read_csv("Booksss.csv")
tf = TfidfVectorizer(analyzer='word', ngram_range=(1, 3), min_df=0, stop_words='english')
tfidf_matrix = tf.fit_transform(ds['Title'])
cosine_similarities = cosine_similarity(tfidf_matrix,tfidf_matrix)
results = {} # dictionary created to store the result in a dictionary format (ID : (Score,item_id))
for idx, row in ds.iterrows():
similar_indices = cosine_similarities[idx].argsort()[:-5:-1] #stores 5 most similar books, you can change it as per your needs
similar_items = [(cosine_similarities[idx][i], ds['ID'][i]) for i in similar_indices]
results[row['ID']] = similar_items[1:]
def item(id):
return ds.loc[ds['ID'] == id]['Title'].tolist()[0]
def recommend(id, num):
if (num == 0):
print("Unable to recommend any book as you have not chosen the number of book to be recommended")
elif (num==1):
print("Recommending " + str(num) + " book similar to " + item(id))
else :
print("Recommending " + str(num) + " books similar to " + item(id))
print("----------------------------------------------------------")
recs = results[id][:num]
test=[]
for rec in recs:
test.append(item(rec[1]))
#print("You may also like to read: " + item(rec[1]) + " (score:" + str(rec[0]) + ")")
return test,item(id)
def Recommendation(id,num)方法采用书本ID和n个要推荐的文章,但是我应该怎么做才能更改脚本,以使其采用用户已阅读并返回书本ID列表的多个ID根据这些ID的