Question

我有以下问题：我正在研究当今最常用的推荐算法的准确性。

因此，一种衡量其性能的方法是通过检查它们在给定数据集的不同大小（即评级矩阵中的稀疏性）下预测某个值的能力如何。

我需要找到一种方法来计算均方根误差（或mae）（某种度量）与数据集中的稀疏性。例如，让我们看一下下面的图片：

您会看到它说：

“ RMSE是稀疏性的函数。在每次迭代中，从训练集中删除5000个评分（最初包含80000个评分）。 “

我正在使用Python和Movielens数据集。您知道如何用上述语言实现吗？有什么工具可以做到这一点吗？

使用Surprise包的小示例：

from surprise import SVD
from surprise import Dataset
from surprise import accuracy
from surprise.model_selection import train_test_split

# Load the movielens-100k dataset (download it if needed),
data = Dataset.load_builtin('ml-100k')

# sample random trainset and testset
# test set is made of 25% of the ratings.
trainset, testset = train_test_split(data, test_size=.25)

# We'll use the famous SVD algorithm.
algo = SVD()

# Train the algorithm on the trainset, and predict ratings for the testset
algo.fit(trainset)
predictions = algo.test(testset)

# Then compute RMSE
accuracy.rmse(predictions)

Source to Surprise

减少python中的协作式过滤推荐系统的训练样本数量

0 个答案: