我有以下示例数据,我正在使用它来学习hadoop mapreduce。 例如,它是跟随者和跟随者的数据。
Follower,followee
a,b
a,c
a,d
c,b
b,d
d,a
b,c
b,e
e,f
就像a跟随b,a跟随c等等......
我正在尝试操纵数据并获得结果,如果a跟随b而b也跟随a,则b应该是输出txt文件中的结果。我是新来的地图减少并试图找到一种方式,以便我可以得到以下结果。
a,d
c,b
答案 0 :(得分:3)
你可以通过一招来实现这一目标。
诀窍是将键传递给reducer,使得(a,d)和(d,a)具有相同的键并最终在同一个reducer中:
当(a,d)到来时:
JQuery
当(d,a)来时:
'a' < 'd', hence emit:
key => a,d
value => a,d
键的形成方式总是在较高的字母表之前出现较低的字母。因此,对于这两个记录,关键是&#34; a,d&#34;
因此mapper的输出将为:
'd' > 'a', hence emit:
key => a,d
value => d,a
现在,在Reducers中,记录将按以下顺序到达:
Record: a,b
Key = a,b Value = a,b
Record: a,c
Key = a,c Value = a,c
Record: a,d
Key = a,d Value = a,d
Record: c,b
Key = b,c Value = c,b
Record: b,d
Key = b,d Value = b,d
Record: d,a
Key = a,d Value = d,a
Record: b,c
Key = b,c Value = b,c
Record: b,e
Key = b,e Value = b,e
Record: e,f
Key = e,f Value = e,f
因此,在reducer中,您只需解析记录3和4:
Record 1:
Key = a,b Value = a,b
Record 2:
Key = a,c Value = a,c
Record 3:
Key = a,d Value = a,d
Key = a,d Value = d,a
Record 4:
Key = b,c Value = c,b
Key = b,c Value = b,c
Record 5:
Key = b,d Value = b,d
Record 6:
Key = b,e Value = b,e
Record 7:
Key = e,f Value = e,f
因此,输出将是:
Record 3:
Key = a,d Value = a,d
Key = a,d Value = d,a
Record 4:
Key = b,c Value = c,b
Key = b,c Value = b,c
即使你有名字而不是字母,这个逻辑也会有效。 对于例如你需要在mapper中使用以下逻辑(其中s1是第一个字符串,s2是第二个字符串):
a,d
c,b
所以,如果你有:
String key = "";
int compare = s1.compareToIgnoreCase(s2);
if(compare >= 0)
key = s1 + "," + s2;
else
key = s2 + "," + s1;
密钥将是:
String s1 = "Stack";
String s2 = "Overflow";
同样,如果你有:
Stack,Overflow
仍然,关键是:
s1 = "Overflow";
s2 = "Stack";