Mahout推荐效果问题

时间:2015-09-21 16:36:02

标签: mysql performance mahout

我使用mahout构建了一个简单的基于Web的(spring-boot)推荐引擎:

  • 基于用户的通用推荐人
  • 最近N个用户neightborhood(邻里大小:200,min-similarity 1)
  • 欧几里德距离相似度(加权)

所有豆子都用他们的缓存对应物进行装饰。

数据集是:

  • 400万品味偏好
  • 400k不同用户
  • 2k项目

从MySQLJDBCDataModel读取:

CREATE TABLE `taste_preferences` (
   `user_id` bigint(20) DEFAULT NULL,
   `item_id` int(11) NOT NULL DEFAULT '0',
   `preference` int(11) NOT NULL,
  `timestamp` datetime DEFAULT NULL,
  KEY `idx_taste_preferences_user_id` (`user_id`),
  KEY `idx_taste_preferences_item_id` (`item_id`),
  KEY `idx_taste_preferences_preference` (`preference`),
  KEY `idx_taste_preferences_distinct` (`user_id`,`item_id`,`preference`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 

在这种情况下,我使用 0.003采样率(我想这意味着使用大约12K的口味偏好)。

通过这种方式,我仍然有10/20"对于给定用户的第一个建议。

如果给出相同的硬件,您如何建议提高性能? 可以更快地成为FileDataModel吗?

1 个答案:

答案 0 :(得分:1)

现在好的表现肯定会更好! 关键点是在ReloadFromJDBCDataModel()中装饰dataModel

DataModel currentDataModel() throws TasteException {
    DataModel datamodel = new ReloadFromJDBCDataModel(
            new MySQLJDBCDataModel(new ConnectionPoolDataSource(datasource), preferenceTable, userIDColumn,
                    itemIDColumn, preferenceColumn, timestampColumn));
    return datamodel;
}

此场景中的dataModel是只读的,但这可能是幕后一些自动加载魔法的问题。

为了完整起见,我配置的重要部分是:

UserSimilarity similarity(DataModel dataModel) throws TasteException {
    return new CachingUserSimilarity(new EuclideanDistanceSimilarity(dataModel, Weighting.WEIGHTED), dataModel);
}

UserNeighborhood userNeighborhood;

UserNeighborhood neighborhood(DataModel dataModel, UserSimilarity userSimilarity) throws TasteException {

    if (useThresholdUserNeighborhood) {
        logger.info("Using ThresholdUserNeighborhood - threshold value is {}", threshold);
        userNeighborhood = new CachingUserNeighborhood(
                new ThresholdUserNeighborhood(threshold, userSimilarity, dataModel), dataModel);
    } else {
        logger.info(
                "Using NearestNUserNeighborhood - neightborhood size is {}, min similarity is {}, sampling rate is {}",
                neighborhoodSize, minSimilarity, samplingRate);
        userNeighborhood = new CachingUserNeighborhood(new NearestNUserNeighborhood(neighborhoodSize, minSimilarity,
                userSimilarity, dataModel, samplingRate), dataModel);
    }
    return userNeighborhood;
}

@Bean
public Recommender buildRecommender(DataModel dataModel) throws TasteException {

    UserSimilarity userSimilarity = similarity(dataModel);
    return new CachingRecommender(
            new GenericUserBasedRecommender(dataModel, neighborhood(dataModel, userSimilarity), userSimilarity));
}