映射器中的(键,值)示例:(用户,(logincount,commentcount))
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String tempString = value.toString();
String[] stringData = tempString.split(",");
String user = stringData[2];
String activity = stringData[1];
if (activity.matches("login")) {
outCount.set(1,0);
}
if (activity.matches("comment")) {
outCount.set(0,1);
}
outUserID.set(userID);
context.write(outUserID, outCount);
}
我算上登录和放大用户的评论。现在我想改变计数:计算每次登录&看看用户是否写了评论。 我怎样才能实现我的mapper或reducer只搜索用户的一条评论并“忽略”所有其他评论(该用户)?
修改
日志文件:
2013-01-01T16:50:56.056+0100,login,User14133,somedata,somedata
2013-01-01T16:55:56.056+0100,login,User14133,somedata,somedata
2013-01-01T05:20:44.044+0100,comment,User14133,somedata,somedata,{text: "something here"}
2013-01-01T05:24:44.044+0100,comment,User14133,somedata,somedata,{text: "something here"}
2013-01-01T20:50:13.013+0100,login,User76892,somedata,somedata
目前输出:
User14133 Logins: 2 Comments: 2
User76892 Logins: 1 Comments: 0
输入:
Mapper<LongWritable, Text, Text, UserCount>
Reducer<Text, UserCount, Text, UserCount>
public static class UserCount implements Writable {
public UserCountTuple() {
set(new IntWritable(0), new IntWritable(0));
}
我的mapreduce计算用户的每个登录和每个评论并总结它们。 我想要实现的是这样的 - &gt; 输出:
User14133 Logins: 2 Comments: 0 or 1 (Did User wrote one comment?)*
* In Mapper or Reducer (?)
for every line in the log{
if (user wrote comment){
return 1;
ignore all other comments from same user in this log;
} else if (user didn't write anything) return 0;
}
答案 0 :(得分:0)
如果我理解正确,您只想获取登录的唯一身份用户总数以及评论总数?
我建议在Hadoop中使用“聚合”缩减器。
在您的映射器中,输出行如下所示:
UniqValueCount:unique_users User14133
LongValueSum:comments 1
UniqValueCount:unique_users User14133
LongValueSum:comments 1
UniqValueCount:unique_users User14133
LongValueSum:comments 1
UniqValueCount:unique_users User14133
LongValueSum:comments 1
UniqValueCount:unique_users User76892
LongValueSum:comments 1
然后运行“聚合”减速器,你应该得到一个看起来像的输出:
unique_users 2
comments 5
我假设这是你想要的?