我正在尝试将两个字段相乘并在加入Pig中的三个表后获取它们的总和。但是我继续收到这个错误:
<file loyalty_program.pig, line 30, column 74> (Name: Multiply Type: null Uid: null)incompatible types in Multiply Operator left hand side:bag :tuple(new_details1::new_details::potential_customers::num_of_orders:long) right hand side:bag :tuple(products::price:int)
-- load the data sets
orders = LOAD '/dualcore/orders' AS (order_id:int,
cust_id:int,
order_dtm:chararray);
details = LOAD '/dualcore/order_details' AS (order_id:int,
prod_id:int);
products = LOAD '/dualcore/products' AS (prod_id:int,
brand:chararray,
name:chararray,
price:int,
cost:int,
shipping_wt:int);
recent = FILTER orders by order_dtm matches '2012-.*$';
customer = GROUP recent by cust_id;
cust_orders = FOREACH customer GENERATE group as cust_id, (int)COUNT(recent) as num_of_orders;
potential_customers = FILTER cust_orders by num_of_orders>=5;
new_details = join potential_customers by cust_id, recent by cust_id;
new_details1 = join new_details by order_id, details by order_id;
new_details2 = join new_details1 by prod_id, products by prod_id;
--DESCRIBE new_details2;
final_details = FOREACH new_details2 GENERATE potential_customers::cust_id, potential_customers::num_of_orders as num_of_orders,recent::order_id as order_id,recent::order_dtm,details::prod_id,products::brand,products::name,products::price as price,products::cost,products::shipping_wt;
grouped_data = GROUP final_details by cust_id;
member = FOREACH grouped_data GENERATE SUM(final_details.num_of_orders * final_details.price) ;
lim = limit member 10;
dump lim;
我甚至将count的结果转换为int。它仍然继续向我抛出这个错误。我不知道如何去做。
答案 0 :(得分:0)
好的..我认为,首先,您想要将购买数量与每种产品的价格相乘,然后您需要该乘积值的总和。
即使这是一个奇怪的要求,但你可以采用以下方法..
您需要做的就是计算final_details Foreach语句本身的乘法,并简单地将SUM应用于该乘法量。
根据你的加载语句,我创建了以下输入文件
main_orders.txt
6666,100,2012-01-01
7777,101,2012-09-02
8888,100,2012-01-09
9999,101,2012-12-08
6666,101,2012-09-02
9999,100,2012-07-12
9999,100,2012-08-01
6666,100,2012-01-02
7777,100,2012-09-09
orders_details.txt
6666,6000
7777,7000
8888,8000
9999,9000
main_products.txt
6000,Nike,Shoes,3000,3000,1
7000,Adidas,Cap,1000,1000,1
8000,Rebook,Shoes,4000,4000,1
9000,Puma,Shoes,25000,2500,1
以下是代码
orders = LOAD '/user/cloudera/inputfiles/main_orders.txt' USING PigStorage(',') AS (order_id:int,cust_id:int,order_dtm:chararray);
details = LOAD '/user/cloudera/inputfiles/orders_details.txt' USING PigStorage(',') AS (order_id:int,prod_id:int);
products = LOAD '/user/cloudera/inputfiles/main_products.txt' USING PigStorage(',') AS(prod_id:int,brand:chararray,name:chararray,price:int,cost:int,shipping_wt:int);
recent = FILTER orders by order_dtm matches '2012-.*';
customer = GROUP recent by cust_id;
cust_orders = FOREACH customer GENERATE group as cust_id, (int)COUNT(recent) as num_of_orders;
potential_customers = FILTER cust_orders by num_of_orders>=5;
new_details = join potential_customers by cust_id, recent by cust_id;
new_details1 = join new_details by order_id, details by order_id;
new_details2 = join new_details1 by prod_id, products by prod_id;
DESCRIBE new_details2;
final_details = FOREACH new_details2 GENERATE potential_customers::cust_id, potential_customers::num_of_orders as num_of_orders,recent::order_id as order_id,recent::order_dtm,details::prod_id,products::brand,products::name,products::price as price,products::cost,products::shipping_wt, (potential_customers::num_of_orders * products::price ) as multiplied_price;// multiplication is achived in last variable
dump final_details;
grouped_data = GROUP final_details by cust_id;
member = FOREACH grouped_data GENERATE SUM(final_details.multiplied_price) ;
lim = limit member 10;
dump lim;
为了清楚起见,我也倾销了final_details foreach语句的输出。
(100,6,6666,2012-01-01,6000,Nike,Shoes,3000,3000,1,18000)
(100,6,6666,2012-01-02,6000,Nike,Shoes,3000,3000,1,18000)
(100,6,7777,2012-09-09,7000,Adidas,Cap,1000,1000,1,6000)
(100,6,8888,2012-01-09,8000,Rebook,Shoes,4000,4000,1,24000)
(100,6,9999,2012-07-12,9000,Puma,Shoes,25000,2500,1,150000)
(100,6,9999,2012-08-01,9000,Puma,Shoes,25000,2500,1,150000)
最终输出低于
(366000)
此代码可能会对您有所帮助,但请再次澄清您的要求