使用Apache Mahout连接到MongoDB

时间:2016-03-16 21:40:27

标签: java mongodb mahout mongodb-java mahout-recommender

我正在尝试使用Apache Mahout生成推荐,同时使用MongoDB根据MongoDBDataModel创建数据模型。我的代码如下:

import java.net.UnknownHostException;
import java.util.List;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.impl.model.mongodb.MongoDBDataModel;
import org.apache.mahout.cf.taste.impl.neighborhood.ThresholdUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
 import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
 import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
 import org.apache.mahout.cf.taste.recommender.RecommendedItem;
 import org.apache.mahout.cf.taste.recommender.UserBasedRecommender;
 import org.apache.mahout.cf.taste.similarity.ItemSimilarity;
 import org.apache.mahout.cf.taste.similarity.UserSimilarity;
 import com.mongodb.MongoException;


public class usingMongo {
public static void main(String[] args) throws UnknownHostException, Mong oException
        ,TasteException {
    final long startTime = System.nanoTime();

    MongoDBDataModel model = new MongoDBDataModel("AdamsLaptop", 27017,
            "test", "ratings100k", false, false, null);
    System.out.println("connected to mongo ");

    UserSimilarity UserSim = new PearsonCorrelationSimilarity(model);

    UserNeighborhood neighborhood = new ThresholdUserNeighborhood(0.5, UserSim, model);

    UserBasedRecommender UserRecommender = new GenericUserBasedRecommender(model, neighborhood, UserSim);
    List<RecommendedItem>UserRecommendations = UserRecommender.recommend(1, 3);
    for (RecommendedItem recommendation : UserRecommendations) {
          System.out.println("You may like movie " + recommendation.getItemID() + " as a user similar to you also rated it " + recommendation.getValue() + " USER");
    }

    ItemSimilarity ItemSim = new PearsonCorrelationSimilarity(model);//LogLikelihoodSimilarity(model);

    GenericItemBasedRecommender ItemRecommender = new GenericItemBasedRecommender(model, ItemSim);
    List<RecommendedItem>ItemRecommendations = ItemRecommender.recommend(1, 3);
    for (RecommendedItem recommendation : ItemRecommendations) {
          System.out.println("You may like movie " + recommendation.getItemID() + " as a user similar to you also rated it " + recommendation.getValue() + " ITEM");
        }


    final long duration = System.nanoTime() - startTime;
    System.out.println(duration);
}
}

我无法看到我出错的地方,但是经过大量的更改和大量的反复试验,错误信息保持不变:

 Exception in thread "main" java.lang.NullPointerException
at org.apache.mahout.cf.taste.impl.model.mongodb.MongoDBDataModel.getID(MongoDBDataModel.java:743)
at org.apache.mahout.cf.taste.impl.model.mongodb.MongoDBDataModel.buildModel(MongoDBDataModel.java:570)
at org.apache.mahout.cf.taste.impl.model.mongodb.MongoDBDataModel.<init>(MongoDBDataModel.java:245)
at recommender.usingMongo.main(usingMongo.java:24)

有什么建议吗?以下是MongoDB中我的数据示例:

{ "_id" : ObjectId("56ddf61f5960960c333f3dcb"),"userId" : 1, "movieId" : 292, "rating" : 4, "timestamp" : 847116936 }

2 个答案:

答案 0 :(得分:0)

我成功地将MongoDB数据集成到了mahout。

mongoDB中数据的结构取决于您使用的相似度算法的类型。例如,

<强> UserSimilarity

MongoDBDataModel datamodel = new MongoDBDataModel(“127.0.0.1”,27017,“testing”,“rating”,true,true,null); 其中user_id,item_id是整数值,preference是float值,created_at是timestamp

<强> SVDRecommender

user_id,item_id是MongoDB对象,首选项是浮点值,created_at是timestamp

您可以做的明显的故障排除是MongoDB服务器是否正在运行。正如它正在运行的例外。我认为问题在于你的数据结构..

使用user_id而不是userId,item_id而不是itemId,而不是rating。我不知道这是否会有所不同。我在线使用了其中一个教程,但目前无法找到它。

当我有超过10000名拥有1000个项目的用户时,它正在工作但速度太慢。

答案 1 :(得分:0)

我认为问题在于mahout假设某些默认值需要驻留在mongoDB中的某些字段,项目ID,用户ID和首选项是user_id,item_id和首选项。所以解决方案可能在于使用另一个MongoDBDataModel构造函数,它使您可以在mongoDB实例中作为参数传递这些字段的名称或重新设计集合架构。

我希望这是有道理的。