Mahout:在一个推荐器中使用两个DataModel

时间:2012-09-27 20:30:55

标签: java machine-learning mahout

我正在尝试使用两组布尔首选项数据创建一个简单的推荐引擎。我想使用一个数据集来计算UserSimilarity和UserNeighborhoods,然后使用这些邻域来从第二组布尔首选项数据中提出建议。

我似乎有这个工作,但问题是当我去计算推荐时,如果用户有基于第一个数据集的邻居,但是第二个数据集中没有(尽管他们的邻居是)它没有提出建议。

这是RecommendationBuilder代码:

  recommenderBuilder = new RecommenderBuilder() {
      public Recommender buildRecommender(DataModel recommendationModel) throws TasteException {
          UserSimilarity similarity = new LogLikelihoodSimilarity(trainingModel);
          UserNeighborhood neighborhood = new NearestNUserNeighborhood(10, 0.7, similarity, recommendationModel);

          return new GenericBooleanPrefUserBasedRecommender(recommendationModel, neighborhood, similarity);
      }
  };

这是trainingModel文件的示例

1,111
2,222

2,111
2,222

3,111
3,222

和recommendedModel文件

1,91
1,92

2,91

为用户2运行此建议92,但是当它到达用户3时会抛出NoSuchUserException

Sol ...有没有办法根据在另一个数据集上计算的相似性从一个数据集中产生推荐,而不需要让所有用户都出现在第二个数据集中?

这是我正在使用的完整代码:

private DataModel trainingModel;
private DataModel recommendationModel;
private RecommenderEvaluator evaluator;
private RecommenderIRStatsEvaluator evaluator2;
private RecommenderBuilder recommenderBuilder;
private DataModelBuilder modelBuilder;

@Override
public void afterPropertiesSet() throws IOException, TasteException {

    trainingModel = new GenericBooleanPrefDataModel(
        GenericBooleanPrefDataModel.toDataMap(new FileDataModel(new File("/music.csv")))
    );

    recommendationModel = new GenericBooleanPrefDataModel(
            GenericBooleanPrefDataModel.toDataMap(new FileDataModel(new File("/movies.csv")))
    );

    evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
    evaluator2 = new GenericRecommenderIRStatsEvaluator();


    recommenderBuilder = new RecommenderBuilder() {
        public Recommender buildRecommender(DataModel model) throws TasteException {
            UserSimilarity similarity = new LogLikelihoodSimilarity(trainingModel);
            UserNeighborhood neighborhood = new NearestNUserNeighborhood(10, 0.7, similarity, model);

            return new GenericBooleanPrefUserBasedRecommender(model, neighborhood, similarity);
        }
    };

    modelBuilder = new DataModelBuilder() {
        public DataModel buildDataModel( FastByIDMap<PreferenceArray> trainingData ) {
            return new GenericBooleanPrefDataModel( GenericBooleanPrefDataModel.toDataMap(trainingData) );
        }        
    };

}

然后运行此方法

    @Override
    public void testData() throws TasteException {

        double score = evaluator.evaluate(recommenderBuilder, modelBuilder, trainingModel, 0.9, 1.0);
        System.out.println("calculated score: " + score);

        try {
            IRStatistics stats = evaluator2.evaluate(
                    recommenderBuilder, modelBuilder, trainingModel, null, 2,
                    0.0,
                    1.0
            );
            System.out.println("recall: " + stats.getRecall());
            System.out.println("precision: " + stats.getPrecision());
        } catch (Throwable t) {
            System.out.println("throwing " + t);
        }

        List<RecommendedItem> recommendations = recommenderBuilder.buildRecommender(recommendationModel).recommend(1,2);
        System.out.println("user 1");
        for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation);}

        recommendations = recommenderBuilder.buildRecommender(recommendationModel).recommend(2,2);
        System.out.println("user 2");
        for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation);}

        try {
            recommendations = recommenderBuilder.buildRecommender(recommendationModel).recommend(3,2);
            System.out.println("user 3");
            for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation);}
        } catch (Throwable t) {
            System.out.println("throwing " + t);
        }
}

生成此输出:

  

计算得分:0.7033357620239258召回:1.0精度:1.0用户1   user 2 RecommendedItem [item:9222,value:0.8516679]   抛出org.apache.mahout.cf.taste.common.NoSuchUserException:3

1 个答案:

答案 0 :(得分:1)

你可以做你正在描述的内容,以及你描述它的方式。为用户相似性度量提供动力的数据集确实可以与作出推荐的数据集不同。用户相似性度量实际上可以基于您喜欢的任何内容。

然而,它确实需要能够为用于提出建议的数据集中的任何对产生用户 - 用户相似性。我建议您在UserSimilarity实现中特别注意这一情况,以便在一个用户未知时返回0或其他内容。