如何在PIG中加入并找到价值?

时间:2017-03-03 19:33:02

标签: hadoop apache-pig

* hwo组合这两个表并检查NDAKOTA区域的ID大于1600

1 plas 1 alaska robert
2波士顿莉莉
3 NDakota Michael
4 NDakota将 5 NDakota Mark

1A 1 09/09/2012 1200
2A 2 8/9/2016 3400
3B 3 4/5/2016 2300

customers = LOAD '/home/vis/Documents/customers' using PigStorage(' ')               AS(cust_id:int,region:chararray,name:chararray);

sales = LOAD '/home/vis/Documents/sales' using PigStorage(' ') 
AS(sales_id:int,cust_id:int,date:datetime,amount:int);

salesNA = FILTER customers BY region =='NDakota';

joined = JOIN sales BY cust_id,salesNA BY cust_id;

grouped = GROUP joined BY cust_id;

summed= FOREACH grouped GENERATE GROUP,SUM(sales.amount);

bigSpenders= FILTER summed BY 1$>1600;

DUMP sorted;

收到错误

enter image description here

1 个答案:

答案 0 :(得分:0)

来自Apache Pig Docs

  

使用消歧运算符(:)来识别后面的字段名称   JOIN,COGROUP,CROSS或FLATTEN操作员。

下面的代码段应该足以实现目标,如果您发现任何问题,请告诉我。

customers = LOAD 'customers.txt' using PigStorage(' ')  AS(cust_id:int,region:chararray,name:chararray);
sales = LOAD 'sales.txt' using PigStorage(' ') AS(sales_id:chararray,cust_id:int,date:chararray,amount:int);
custNA = FILTER customers BY region =='NDakota';
joined = JOIN sales BY cust_id,custNA BY cust_id;
req_data = FILTER joined BY amount > 1600;
DUMP req_data;