重塑猪的数据 - 将行值更改为列名

时间:2017-02-24 01:08:36

标签: hadoop apache-pig

有没有办法重塑猪的数据?

数据看起来像这样 -

id | p1 | count   
1  | "Accessory" | 3    
1  | "clothing" | 2     
2  | "Books" | 1   

我想重塑数据,以便输出看起来像这样 -

id | Accessory | clothing | Books    
1  | 3  |  2 | 0    
2  | 0  |  0 | 1

有人可以提出一些建议吗?

1 个答案:

答案 0 :(得分:1)

如果它是一组固定的产品系列,则下面的代码可能有所帮助,否则您可以使用自定义UDF来帮助实现目标。

输入:a.csv

1|Accessory|3    
1|Clothing|2     
2|Books|1   

Pig Snippet:

test = LOAD 'a.csv' USING PigStorage('|') AS (product_id:long,product_name:chararray,rec_cnt:long);
req_stats = FOREACH (GROUP test BY product_id) {
    accessory = FILTER test BY product_name=='Accessory';
    clothing = FILTER test BY product_name=='Clothing';
    books = FILTER test BY product_name=='Books';
    GENERATE group AS product_id, (IsEmpty(accessory)  ? '0' : BagToString(accessory.rec_cnt)) AS a_cnt, (IsEmpty(clothing)  ? '0' : BagToString(clothing.rec_cnt)) AS c_cnt, (IsEmpty(books)  ? '0' : BagToString(books.rec_cnt)) AS b_cnt;

};

DUMP req_stats;

输出:DUMP req_stats;

(1,3,2,0)
(2,0,0,1)