使用皮尔逊相关系数的推荐者

时间:2016-09-23 11:31:29

标签: c# correlation pearson

我有一个关于在推荐系统中使用皮尔森相关系数的问题。

我目前在我的数据库中有3个集合。 1个用户,1个用于餐馆,1个用于评论。

我编写了一个功能,该功能需要2个用户ID及其提交的评论列表,并返回一个double,即2个用户之间基于他们提交的评论的皮尔森相关系数。< / p>

所以该功能的作用是制作用户提交的所有评论的2个列表。然后for循环检查他们是否有留在同一餐厅的评论,并将这些评论放在列表中。该列表用于计算系数。

我只是想知道我是否以正确的方式使用这个系数。我想向第一个用户提供建议。我可以使用这个系数作为适合其他用户的人的良好指标吗?

如果它不是匹配用户的好方法,那么最好的方法是什么?

如果有人想知道,这里是我计算系数的函数。

public static double CalculatePearsonCorrelation(Guid userId1, List<Review> user1Reviews, 
                                                Guid userId2, List<Review> user2Reviews)
    {
        //Resetting the dictionary
        restaurantRecommendations = new Dictionary<Guid, List<Review>>();
        //Matching the reviews with the corresponding user
        restaurantRecommendations.Add(userId1, user1Reviews);
        restaurantRecommendations.Add(userId2, user2Reviews);
        //Check if users have enough reviews to get a correct correlation
        if (restaurantRecommendations[userId1].Count < 4)
            throw new NotEnoughReviewsException("UserId " + userId1 + " doesn't contain enough reviews for this correlation");
        if (restaurantRecommendations[userId2].Count < 4)
            throw new NotEnoughReviewsException("UserId " + userId2 + " doesn't contain enough reviews for this correlation");                
        //This will be the list of reviews that are the same per subject for the two users.
        List<Review> shared_items = new List<Review>();
        //Loops through the list of reviews of the selected user (userId1)
        foreach (var item in restaurantRecommendations[userId1])
        {
            //Checks if they have any reviews on subjects in common
            if (restaurantRecommendations[userId2].Where(x => x.subj.Id == item.subj.Id).Count() != 0)
            {
                //Adds these reviews to a list on which the correlation will be based
                shared_items.Add(item);
            }
        }
        //If they don't have anything in common, the correlation will be 0
        if (shared_items.Count() == 0)
            return 0;
        //I decided users need at least 4 subjects in common, else there won't be an accurate correlation
        if (shared_items.Count() < 4)
            throw new NotEnoughReviewsException("UserId " + userId1 + " and UserId " + userId2 + " don't have enough reviews in common for a correlation");
        ////////////////////////// Calculating the pearson correlation //////////////////////////
        double product1_review_sum = 0.00f;
        double product2_review_sum = 0.00f;
        double product1_rating = 0f;
        double product2_rating = 0f;
        double critics_sum = 0f;
        foreach (Review item in shared_items)
        {
            product1_review_sum += restaurantRecommendations[userId1].Where(x => x.subj.Id == item.subj.Id).FirstOrDefault().rating;
            product2_review_sum += restaurantRecommendations[userId2].Where(x => x.subj.Id == item.subj.Id).FirstOrDefault().rating;
            product1_rating += Math.Pow(restaurantRecommendations[userId1].Where(x => x.subj.Id == item.subj.Id).FirstOrDefault().rating, 2);
            product2_rating += Math.Pow(restaurantRecommendations[userId2].Where(x => x.subj.Id == item.subj.Id).FirstOrDefault().rating, 2);
            critics_sum += restaurantRecommendations[userId1].Where(x => x.subj.Id == item.subj.Id).FirstOrDefault().rating *
                            restaurantRecommendations[userId2].Where(x => x.subj.Id == item.subj.Id).FirstOrDefault().rating;
        }
        //Calculate pearson correlation
        double num = critics_sum - (product1_review_sum * product2_review_sum / shared_items.Count);
        double density = Math.Sqrt((product1_rating - Math.Pow(product1_review_sum, 2) / shared_items.Count) * 
                                    ((product2_rating - Math.Pow(product2_review_sum, 2) / shared_items.Count)));
        if (density == 0)
            return 0;

        return num / density;
    }
}

0 个答案:

没有答案