mhout中的Tanimoto系数仅返回1.0作为预测值

时间:2015-04-27 17:50:15

标签: java mahout recommendation-engine mahout-recommender collaborative-filtering

我试图运行mahout框架并在项目集上使用Tanimoto系数。幸运的是,它适用于我,但它为所有预测项返回值1.0,代码如下:

public static void main(String[] args) throws Exception {

             DataModel model = new FileDataModel(new     File("stack.csv"));      //load data from file     needed for     computation
            UserSimilarity similarity = new         TanimotoCoefficientSimilarity(model); //log likelihood similarity will be     used         for making recommendation .
    /*To use TanimotoCoefficientSimilarity replace “LogLikelihoodSimilarity”         with TanimotoCoefficientSimilarity”.
    UserSimilarity implementation provides how similar two two users are     using     LoglikehoodSimilarity */
            UserNeighborhood neighborhood =  new NearestNUserNeighborhood(2,         similarity, model);  //Define a group of     user most similar to a given user . 2        define a group of 2 user having most similar preference

             Recommender recommender = new GenericUserBasedRecommender(      model,     neighborhood, similarity); // creates a     recommendation engine


             List<RecommendedItem>recommendations =      recommender.recommend(3, 5);
/*one recommendation for user with ID 4 . In Mahout it always take Integer         value i.e It will always take userId and     number of item to be recommended */

            for (RecommendedItem recommendation : recommendations) {
                System.out.println(recommendation);
            }

    }

输出如下:

[main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel - Creating FileDataModel for file stack.csv
[main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel - Reading file info...
[main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel - Read lines: 696
RecommendedItem[item:589, value:1.0]
RecommendedItem[item:380, value:1.0]
RecommendedItem[item:2916, value:1.0]
RecommendedItem[item:3107, value:1.0]
RecommendedItem[item:2028, value:1.0]

Part of my data file is as follow:

    1   3408
    1   595
    1   2398
    1   2918
    1   2791
    1   2687
    1   3105
      .
      .
      .

据我所知,Tanimoto Coefficient值通常介于0和1.0之间,但这里只显示1.0,这是我认为不可能实现的。所以,任何人都有任何想法如何解决这个问题?我有什么门槛可以改变吗?

非常感谢任何帮助。

非常感谢提前。

1 个答案:

答案 0 :(得分:1)

Tanimoto系数,或者也称为Jaccard系数,完全忽略了偏好值,只是认为用户喜欢这个项目,仅此而已。如何计算?最终值是两个用户表示某些偏好(换句话说仅仅是喜欢)的项目数除以用户表达某些偏好的项目数。

在此处阅读有关Jaccard系数的更多信息:reference docs

http://en.wikipedia.org/wiki/Jaccard_index一书中详细了解Mahout的实施&nbsp;