Cassandra基于Mahout的用户朋友推荐

时间:2014-02-26 20:17:35

标签: cassandra mahout mahout-recommender

我想推荐一个用户,当前用户可以添加为朋友的用户列表。

我正在使用Cassandra和mahout。 mahout集成包​​中已经有CassandraDataModel的实现。我想用这个课。

所以我推荐的课程如下所示

public class UserFriendsRecommender {

@Inject
private CassandraDataModel dataModel;

public List<RecommendedItem> recommend(Long userId, int number) throws TasteException{
    UserSimilarity userSimilarity = new PearsonCorrelationSimilarity(dataModel);
    // Optional:
    userSimilarity.setPreferenceInferrer(new AveragingPreferenceInferrer(dataModel));

    UserNeighborhood neighborhood =
              new NearestNUserNeighborhood(3, userSimilarity, dataModel);
    Recommender recommender = new GenericUserBasedRecommender(dataModel, neighborhood, userSimilarity); 
    Recommender cachingRecommender = new CachingRecommender(recommender);
    List<RecommendedItem> recommendations = cachingRecommender.recommend(userId, number);
    return recommendations;
}

}

CassandraDataModel有4个列族

static final String USERS_CF = "users";
  static final String ITEMS_CF = "items";
  static final String USER_IDS_CF = "userIDs";
  static final String ITEM_IDS_CF = "itemIDs";

我很难理解这门课程,特别是专栏系列。是否有任何我可以寻找的例子,或者如果有人可以通过一个小例子解释会很好。?

javadoc说这个

* <p>
 * First, it uses a column family called "users". This is keyed by the user ID
 * as an 8-byte long. It contains a column for every preference the user
 * expresses. The column name is item ID, again as an 8-byte long, and value is
 * a floating point value represnted as an IEEE 32-bit floating poitn value.
 * </p>
 * 
 * <p>
 * It uses an analogous column family called "items" for the same data, but
 * keyed by item ID rather than user ID. In this column family, column names are
 * user IDs instead.
 * </p>
 * 
 * <p>
 * It uses a column family called "userIDs" as well, with an identical schema.
 * It has one row under key 0. It contains a column for every user ID in the
 * model. It has no values.
 * </p>
 * 
 * <p>
 * Finally it also uses an analogous column family "itemIDs" containing item
 * IDs.
 * </p>

2 个答案:

答案 0 :(得分:2)

关于CassandraDataMdoel所需列系列的所有以下说明应在您创建的键空间(推荐者或其他名称)下的cassandra-cli中执行。

1:表用户

userID是行键,每个itemID都有一个单独的列名,value是首选项:

CREATE COLUMN FAMILY users
WITH comparator = LongType
AND key_validation_class=LongType
AND default_validation_class=FloatType;

插入值:

set users[0][0]='1.0';
set users[1][0]='3.0';
set users[2][2]='1.0';

2:表项目

itemID是行键,每个userID都有一个单独的列名,value是首选项:

CREATE COLUMN FAMILY items
WITH comparator = LongType
AND key_validation_class=LongType
AND default_validation_class=FloatType;

插入值:

set items[0][0]='1.0';
set items[0][1]='3.0';
set items[2][2]='1.0';

3:表userIDs

这个表只有一行,但是很多列,即每个userID都有一个单独的列:

CREATE COLUMN FAMILY userIDs
WITH comparator = LongType
AND key_validation_class=LongType;

插入值:

set userIDs[0][0]='';
set userIDs[0][1]='';
set userIDs[0][2]='';

4:表itemIDs:

这个表只有一行,但是很多列,即每个itemID都有一个单独的列:

CREATE COLUMN FAMILY itemIDs
WITH comparator = LongType
AND key_validation_class=LongType;

插入值:

set itemIDs[0][0]='';
set itemIDs[0][1]='';
set itemIDs[0][2]='';

答案 1 :(得分:0)

补充上面的答案,对于Cassandra 2.0,新的语法如下,因为cli已被弃用。

表用户:

CREATE TABLE用户(userID bigint,itemID bigint,值float,PRIMARY KEY(userID,itemID));

表项目:

CREATE TABLE项(itemID bigint,userID bigint,value float,PRIMARY KEY(itemID,userID));

表userIDs:

CREATE TABLE userIDs(id bigint,userID bigint PRIMARY KEY(id,userID));

表itemIDs:

CREATE TABLE itemIDs(id bigint,itemID bigint PRIMARY KEY(id,itemID));