我是python和Data Science领域的新手。我正在使用LightFM
实现Hybrid Recommender系统..来自UCI机器学习库的数据集:
https://archive.ics.uci.edu/ml/datasets/Restaurant+%26+consumer+data
为了简单起见,我只使用了rating_final.csv,userprofile.csv和geoplaces2.csv。 (信息,为了同一问题的再现性)
这是我的第一个问题,对任何错误都很抱歉。
数据:https://github.com/RahulPriyadarshi3785/DAT210x/tree/master/Recommender%20System%20Trial
df1是以下形式的数据框:
user_row rest_col final_rating
df = pd.read_csv('C:/Users/hp/Documents/JupyterDemo/Data Food Items/rating_final.csv', sep = ',', header = 0)
rest_name = pd.read_csv('C:/Users/hp/Documents/JupyterDemo/Data Food Items/geoplaces2.csv', sep = ',', header = 0, encoding = "ISO-8859-1")
users = pd.read_csv('C:/Users/hp/Documents/JupyterDemo/Data Food Items/userprofile.csv', sep = ',', header = 0)
rest_name = rest_name.loc[:,["placeID", "name"]]
rest_name['rest_col'] = pd.DataFrame(np.arange(1,rest_name.shape[0]))
users = users.loc[:,['userID']]
df['user_row'] = pd.to_numeric(df.userID.str.replace('[^0-9]',""), errors = 'coerce') - 1000
df['final_rating'] = np.mean(df.loc[:,["rating","food_rating","service_rating"]], axis = 1)
df1 = df.loc[:,["user_row","rest_col","final_rating"]]
df1 = df1.dropna(axis = 0)
df1[pd.isnull(df1).any(axis=1)]
df1['rest_col'] = df1['rest_col'].astype(int)
# user_row is row no. 1 to length of users i.e 138
# rest_col is col no. 1 to length of resturants i.e. 130
# final_rating is mean of rating, service_rating and restaurant rating.
# Splitting the Data
from sklearn import model_selection
train_data, test_data = model_selection.train_test_split(df1, test_size = 0.25)
n_users = users.shape[0] # users is dataframe form userProfile.csv
n_items = rest_name.shape[0] # restaurant is dataframe from geoplaces2.csv
# Creating User-Item Matrices, one for Training and other for testing
# intantiating training data matrix with 0 value
train_data_matrix = np.zeros((n_users, n_items))
# iterating through the train data matrix and inputting rating
for each_line in train_data.itertuples():
train_data_matrix[each_line[1]-1, each_line[2]-1] = each_line[3]
# intantiating testing data matrix with 0 value
test_data_matrix = np.zeros((n_users, n_items))
# iterating through the train data matrix and inputting rating
for each_line in train_data.itertuples():
test_data_matrix[each_line[1]-1, each_line[2]-1] = each_line[3]
train_data_matrix = sparse.coo_matrix(train_data_matrix)
test_data_matrix = sparse.coo_matrix(test_data_matrix)
model = LightFM(loss='warp')
#train model
model.fit(train_data_matrix, epochs=30, num_threads=2)
# Evaluate the trained model
print("Train precision: %.2f" % precision_at_k(model, train_data_matrix, k=5).mean()) #produces NAN
print("Test precision: %.2f" % precision_at_k(model, test_data_matrix, k=5).mean())# produces NAN
#试图检查adagrade和adadelta模型
adagrad_model = LightFM(no_components=30,
loss='warp',
learning_schedule='adagrad',
user_alpha=alpha,
item_alpha=alpha)
adadelta_model = LightFM(no_components=30,
loss='warp',
learning_schedule='adadelta',
user_alpha=alpha,
item_alpha=alpha)
adagrad_auc = []
for epoch in range(epochs):
adagrad_model.fit_partial(train_data_matrix, epochs=1)
adagrad_auc.append(auc_score(adagrad_model, test_data_matrix).mean())
adadelta_auc = []
for epoch in range(epochs):
adadelta_model.fit_partial(train_data_matrix, epochs=1)
adadelta_auc.append(auc_score(adadelta_model, test_data_matrix).mean())
x = np.arange(len(adagrad_auc))
plt.plot(x, np.array(adagrad_auc))
plt.plot(x, np.array(adadelta_auc))
plt.legend(['adagrad', 'adadelta'], loc='lower right')
plt.show()
问题:
1)。 Nan在计算测试和训练数据集的精度时
2)。空切片的意思
3)。与.loc()相关的未来警告并使用.reindex()而不是
请帮忙,我想我已经被困了
答案 0 :(得分:0)
“df”数据帧不包含rest_col,因此它全部存储为nan。
df = pd.merge(df, rest_name, on='placeID', how='inner')
df1 = df.loc[:,["user_row","rest_col","final_rating"]]
现在,dataFrame创建时没有任何Nan。 而且最终输出会很好。