我正在尝试将两个文件从hdfs加载到pig。 在我用卡车关系加入司机关系之后,我想算一下。 我如何计算关系中的行? 我尝试了这个,但是它给了我一组计数而不是一个计数:
truck_temp = FOREACH (GROUP truck_join BY drivers_info::driverId) { GENERATE group, COUNT(truck_join); };
drivers_load = LOAD '/Pig-Practice/drivers.csv' USING PigStorage(',') AS (driverId:int,name:chararray,ssn:biginteger,location:chararray,certified:chararray,wageplan:chararray);
drivers_info = FOREACH ( GROUP drivers_load BY (driverId,name)) GENERATE group.driverId,group.name;
event_load = LOAD '/Pig-Practice/truck_event_text_partition.csv' USING PigStorage(',') AS (driverId:int, truckId:int, eventTime:chararray,
eventType:chararray, longitude:double, latitude:double,
eventKey:chararray, correlationId:long, driverName:chararray,
routeId:long,routeName:chararray,eventDate:chararray);
truck_events1 = FILTER event_load BY $0 >1;
truck_events2 = FOREACH (GROUP truck_events1 BY (driverId,driverName,routeId,routeName) ) GENERATE group.driverId,group.driverName,group.routeId,group.routeName;
truck_join = JOIN drivers_info BY driverId, truck_events2 BY driverId;
答案 0 :(得分:0)
要获得加入后的总计数,您需要对所有人进行分组。
COUNT需要一个前面的GROUP ALL语句用于全局计数,一个GROUP BY语句用于组计数。 参考:COUNT
truck_temp = FOREACH (GROUP truck_join ALL)
{
GENERATE COUNT(truck_join);
};