Question

我正在关注发现here的论文，并试图进行批量梯度下降（BGD），而不是本文所述的随机梯度下降（SGD）。

对于SGD，我收集的是你这样做（伪代码）：

for each user's actual rating {

    1. calculate the difference between the actual rating
       and the rating calculated from the dot product
       of the two factor matrices (user vector and item vector).

    2. multiply answer from 1. by the item vector
       corresponding to that rating.

    3. alter the initial user vector by the figure
       calculated in 2. x by lambda e.g.:

            userVector = userVector + lambda x answer from 2.
}

Repeat for every user

Do the same for every Item, except in 2. multiply by the user vector instead of the item vector

Go back to start and repeat until some breakpoint

对于BGD，我所做的是：

for each user {

    1. sum up all their prediction errors e.g. 
       real rating - (user vector . item vector) x item vector
    2. alter the user vector by the figure calculated in 1. x by lambda.
}

Then repeat for the Items exchanging item vector in 2. for user vector

这似乎有道理，但在进一步阅读时，我对BGD感到困惑。它表示BGD必须遍历整个数据集才能进行1次更改。这是否意味着我所做的，与该特定用户相关的整个数据集，还是字面意思是整个数据集？

我做了一个贯穿整个数据集的实现，对每个预测错误求和，然后使用该数字更新每个用户向量（所以所有用户向量都以相同的数量更新！）。然而，即使λ为0.002，它也不会接近最小值并且波动很快。它可以从12'500的平均误差变为1.2，然后变为-539等。最终，数字接近无穷大，我的程序失败。

对此背后的数学方面的任何帮助都会很棒。

矩阵分解和梯度下降

0 个答案: