有没有办法重塑猪的数据?
数据看起来像这样 -
id | p1 | count
1 | "Accessory" | 3
1 | "clothing" | 2
2 | "Books" | 1
我想重塑数据,以便输出看起来像这样 -
id | Accessory | clothing | Books
1 | 3 | 2 | 0
2 | 0 | 0 | 1
有人可以提出一些建议吗?
答案 0 :(得分:1)
如果它是一组固定的产品系列,则下面的代码可能有所帮助,否则您可以使用自定义UDF来帮助实现目标。
输入:a.csv
1|Accessory|3
1|Clothing|2
2|Books|1
Pig Snippet:
test = LOAD 'a.csv' USING PigStorage('|') AS (product_id:long,product_name:chararray,rec_cnt:long);
req_stats = FOREACH (GROUP test BY product_id) {
accessory = FILTER test BY product_name=='Accessory';
clothing = FILTER test BY product_name=='Clothing';
books = FILTER test BY product_name=='Books';
GENERATE group AS product_id, (IsEmpty(accessory) ? '0' : BagToString(accessory.rec_cnt)) AS a_cnt, (IsEmpty(clothing) ? '0' : BagToString(clothing.rec_cnt)) AS c_cnt, (IsEmpty(books) ? '0' : BagToString(books.rec_cnt)) AS b_cnt;
};
DUMP req_stats;
输出:DUMP req_stats;
(1,3,2,0)
(2,0,0,1)