我有问题,总结2个日志文件。
示例文件:
文件-1
id用户视图
1 AAA 2
2 BBB 5
3 CCC 9
文件-2
id用户视图地址
1 AAA 5 XXX
2 BBB 2 YYY
6 FFF 4 ZZZ
我希望通过id和sum(view)求和两个文件,我希望输出:
输出:
id user view address
1 AAA 7 XXX
2 BBB 7 YYY
我应该尝试代码连接两个文件,但我不总结两个文件:
我的代码:
inputdata = LOAD '/user/hdfs/tes/part-1' AS (
id:chararray,
user:chararray,
view:int
);
inputdata2 = LOAD '/user/hdfs/tes/part-2' AS (
id:chararray,
user:chararray,
view:int,
address:chararray
);
joined = JOIN inputdata BY id LEFT OUTER, inputdata2 by id;
outputlist = FOREACH joined {
GENERATE
inputdata::id,
inputdata::user,
--sum(inputdata2::view),
inputdata2::address;
}
dump outputlist;
我想问一下,如何在两个日志文件中对视图进行求和。??
感谢。
答案 0 :(得分:2)
在foreach循环中获取连接结果并总结视图值。这样可以。
A = LOAD 'file1.dat' using PigStorage(' ') AS (a:chararray,b:chararray,c:int);
B = LOAD 'file2.dat' using PigStorage(' ') AS (a:chararray,b:chararray,c:int,d:chararray);
C = JOIN A by a,B by a;
D = FOREACH C GENERATE A::a as id,A::b as user,A::c + B::c as view,B::d as address;
输出:
(1,AAA,7,XXX)
(2,BBB,7,YYY)