如何评估火花MLlib中的隐式训练模型

时间:2016-11-04 19:25:38

标签: apache-spark pyspark apache-spark-mllib

使用spark MLlib来训练具有隐式评级的ALS模型,我所拥有的数据就像这样...... user_id,item_id,number_of_purchase,在阅读了关于ALS隐式训练之后,似乎得到的矩阵是一个偏好矩阵接近(0 - > 1)的值,我的问题是如何根据测试数据对其进行评估,因为测试矩阵有number_of_purchase所以如果我做对了,就不能使用RMSE

# Load and parse the data
data = sc.textFile("dataset_60k.txt")
training, test = data.randomSplit([0.8, 0.2])

train_ratings = training.map(lambda l: l.split(',')).map(lambda l: Rating(int(l[0]), int(l[1]), float(l[2])))
test_ratings = test.map(lambda l: l.split(',')).map(lambda l: Rating(int(l[0]), int(l[1]), float(l[2])))

# Build the recommendation model using Alternating Least Squares
rank = 10
numIterations = 10

model = ALS.trainImplicit(train_ratings, rank=rank, iterations=numIterations)

# Evaluate the model on training data
testdata = test_ratings.map(lambda p: (p[0], p[1]))

#this is a prefrence matrix
predictions = model.predictAll(testdata).map(lambda r: ((r[0], r[1]), r[2])) 

ratesAndPreds = test_ratings.map(lambda r: ((r[0], r[1]), r[2])).join(predictions)
MSE = ratesAndPreds.map(lambda r: (r[1][0] - r[1][1]) ** 2).mean()
print("Mean Squared Error = " + str(MSE))

0 个答案:

没有答案