所以我有这些我正在使用的特定列:
customer_token
,merchant_id
,merchant_category_code
和transaction_amount
。
我目前的疑问是:
SELECT customer_token, COUNT(transaction_amount), SUM(transaction_amount)
FROM transaction
WHERE file_date>20121031
and file_date<20121201
GROUP BY customer_token
我想在上面的查询中添加一个部分,在结果中,根据每个特定merchant_category_code
中的交易金额,将merchant_category_code分成不同的列。结果看起来像这样:
customer_token,count(transaction_amount),sum(transaction_amount),count(merchant_category_code中排名为1的transaction_amount),count(merchant_category_code中排名为2的transaction_amount),count(merchant_category_code中排名为3的transaction_amount)等。 ..
然后这个:
customer_token,count(transaction_amount),sum(transaction_amount),sum(merchant_category_code中排名为1的transaction_amount),sum(merchant_category_code中排名为2的transaction_amount),sum(merchant_category_code中排名为3的transaction_amount)等。 ..
但我对如何做到这一点感到茫然,或者甚至在可能的情况下都是如此。
答案 0 :(得分:2)
如果您事先知道merchant_category_code
的可能值是什么,则可以使用CASE
表达式:
SELECT customer_token,
COUNT(transaction_amount),
SUM(transaction_amount),
COUNT(CASE WHEN merchant_category_code = 1 THEN transaction_amount END),
COUNT(CASE WHEN merchant_category_code = 2 THEN transaction_amount END),
COUNT(CASE WHEN merchant_category_code = 3 THEN transaction_amount END),
...
SUM(CASE WHEN merchant_category_code = 1 THEN transaction_amount END),
SUM(CASE WHEN merchant_category_code = 2 THEN transaction_amount END),
SUM(CASE WHEN merchant_category_code = 3 THEN transaction_amount END),
...
FROM transaction
WHERE file_date BETWEEN 20121101 AND 20121130
GROUP
BY customer_token
;
(或IF
表达式,如果您愿意;有关这两者的文档,请参阅the section titled "Conditional Functions" on the page "LanguageManual+UDF" in the Hive wiki)。