在LIghtFM函数中,precision_at_k返回Nan,fit_partial返回带有RuntimeWarning的空图:空切片的平均值

时间:2018-01-23 01:39:38

标签: python

我是python和Data Science领域的新手。我正在使用LightFM实现Hybrid Recommender系统..来自UCI机器学习库的数据集: https://archive.ics.uci.edu/ml/datasets/Restaurant+%26+consumer+data

为了简单起见,我只使用了rating_final.csv,userprofile.csv和geoplaces2.csv。 (信息,为了同一问题的再现性)

这是我的第一个问题,对任何错误都很抱歉。

代码可以在这里找到:https://github.com/RahulPriyadarshi3785/DAT210x/blob/master/Recommender%20System%20Trial/Recommender%20System.ipynb

数据:https://github.com/RahulPriyadarshi3785/DAT210x/tree/master/Recommender%20System%20Trial

df1是以下形式的数据框:

user_row rest_col final_rating

df = pd.read_csv('C:/Users/hp/Documents/JupyterDemo/Data Food Items/rating_final.csv', sep = ',', header = 0)
rest_name = pd.read_csv('C:/Users/hp/Documents/JupyterDemo/Data Food Items/geoplaces2.csv', sep = ',', header = 0, encoding = "ISO-8859-1")
users = pd.read_csv('C:/Users/hp/Documents/JupyterDemo/Data Food Items/userprofile.csv', sep = ',', header = 0)


rest_name = rest_name.loc[:,["placeID", "name"]]
rest_name['rest_col'] = pd.DataFrame(np.arange(1,rest_name.shape[0]))
users = users.loc[:,['userID']]
df['user_row'] = pd.to_numeric(df.userID.str.replace('[^0-9]',""), errors = 'coerce') - 1000
df['final_rating'] = np.mean(df.loc[:,["rating","food_rating","service_rating"]], axis = 1)

df1 = df.loc[:,["user_row","rest_col","final_rating"]]
df1 = df1.dropna(axis = 0)
df1[pd.isnull(df1).any(axis=1)]
df1['rest_col'] = df1['rest_col'].astype(int)

# user_row is row no. 1 to length of users i.e 138
# rest_col is col no. 1 to length of resturants i.e. 130
# final_rating is mean of rating, service_rating and restaurant rating.

# Splitting the Data
from sklearn import model_selection
train_data, test_data = model_selection.train_test_split(df1, test_size = 0.25)

n_users = users.shape[0] # users is dataframe form userProfile.csv
n_items = rest_name.shape[0] # restaurant is dataframe from geoplaces2.csv

# Creating User-Item Matrices, one for Training and other for testing

# intantiating training data matrix with 0 value
train_data_matrix = np.zeros((n_users, n_items))
# iterating through the train data matrix and inputting rating
for each_line in train_data.itertuples():
    train_data_matrix[each_line[1]-1, each_line[2]-1] = each_line[3]

# intantiating testing data matrix with 0 value
test_data_matrix = np.zeros((n_users, n_items))
# iterating through the train data matrix and inputting rating
for each_line in train_data.itertuples():
    test_data_matrix[each_line[1]-1, each_line[2]-1] = each_line[3]

train_data_matrix = sparse.coo_matrix(train_data_matrix)
test_data_matrix = sparse.coo_matrix(test_data_matrix)


model = LightFM(loss='warp')

 #train model

model.fit(train_data_matrix, epochs=30, num_threads=2)


# Evaluate the trained model

print("Train precision: %.2f" % precision_at_k(model, train_data_matrix, k=5).mean()) #produces NAN


print("Test precision: %.2f" % precision_at_k(model, test_data_matrix, k=5).mean())# produces NAN

#试图检查adagrade和adadelta模型

 adagrad_model = LightFM(no_components=30,
                    loss='warp',
                    learning_schedule='adagrad',
                    user_alpha=alpha,
                    item_alpha=alpha)
adadelta_model = LightFM(no_components=30,
                       loss='warp',
                       learning_schedule='adadelta',
                       user_alpha=alpha,
                        item_alpha=alpha)

adagrad_auc = []

for epoch in range(epochs):
    adagrad_model.fit_partial(train_data_matrix, epochs=1)
    adagrad_auc.append(auc_score(adagrad_model, test_data_matrix).mean())


adadelta_auc = []

for epoch in range(epochs):
    adadelta_model.fit_partial(train_data_matrix, epochs=1)
    adadelta_auc.append(auc_score(adadelta_model, test_data_matrix).mean())

x = np.arange(len(adagrad_auc))
plt.plot(x, np.array(adagrad_auc))
plt.plot(x, np.array(adadelta_auc))
plt.legend(['adagrad', 'adadelta'], loc='lower right')
plt.show()

问题:

1)。 Nan在计算测试和训练数据集的精度时

2)。空切片的意思

3)。与.loc()相关的未来警告并使用.reindex()而不是

请帮忙,我想我已经被困了

1 个答案:

答案 0 :(得分:0)

“df”数据帧不包含rest_col,因此它全部存储为nan。

df = pd.merge(df, rest_name, on='placeID', how='inner')
df1 = df.loc[:,["user_row","rest_col","final_rating"]]

现在,dataFrame创建时没有任何Nan。 而且最终输出会很好。