基于多篇文章的余弦相似度用户阅读

时间:2019-03-22 14:29:44

标签: python cosine-similarity

我正在使用基于内容的过滤方法构建(文章的)推荐系统。为此,我使用余弦相似度。

我尝试了一个脚本(来自教程),该脚本基本上根据给定/已读的文章返回“ n”篇文章

这是我尝试过的脚本

import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
ds = pd.read_csv("Booksss.csv") 
tf = TfidfVectorizer(analyzer='word', ngram_range=(1, 3), min_df=0,     stop_words='english')


tfidf_matrix = tf.fit_transform(ds['Title'])
cosine_similarities = cosine_similarity(tfidf_matrix,tfidf_matrix)

results = {} # dictionary created to store the result in a dictionary format (ID : (Score,item_id))

for idx, row in ds.iterrows(): 

    similar_indices = cosine_similarities[idx].argsort()[:-5:-1] #stores 5 most similar books, you can change it as per your needs
   similar_items = [(cosine_similarities[idx][i], ds['ID'][i]) for i in similar_indices]
   results[row['ID']] = similar_items[1:]


def item(id):
    return ds.loc[ds['ID'] == id]['Title'].tolist()[0]
def recommend(id, num):
    if (num == 0):
        print("Unable to recommend any book as you have not chosen the number of book to be recommended")
    elif (num==1):
    print("Recommending " + str(num) + " book similar to " + item(id))

    else :
        print("Recommending " + str(num) + " books similar to " + item(id))

print("----------------------------------------------------------")
recs = results[id][:num]
test=[]
for rec in recs:
    test.append(item(rec[1]))
    #print("You may also like to read: " + item(rec[1]) + " (score:" + str(rec[0]) + ")")
return test,item(id)

def Recommendation(id,num)方法采用书本ID和n个要推荐的文章,但是我应该怎么做才能更改脚本,以使其采用用户已阅读并返回书本ID列表的多个ID根据这些ID的

0 个答案:

没有答案