我想运行一个查询,该查询计算每张信用卡每个月的最大花费。对于每张信用卡,我将需要计算每月花费的总金额。我有一个表格,其中包含信用卡credit_transact
的交易:
processdate timestamp ""
cardno_hash string ""
amount int ""
year int ""
month int ""
弥补样本数据:
card year month amount
a123 2016 12 23160
a123 2016 10 287
c123 2016 11 5503
c123 2016 11 4206
我想要:
card year month amount
a123 2016 12 23160
c123 2016 11 9709
重要的是年份和月份是分区列。
我尝试了如下子查询:
USE credit_card_db;
SELECT sum_amount_transact.cardno_hash, sum_amount_transact.year, sum_amount_transact.month, MAX(sum_amount_transact.sum_amount)
FROM
(
SELECT cardno_hash, year, month, SUM(amount) AS sum_amount FROM credit_transact
GROUP BY cardno_hash, year, month
) AS sum_amount_transact
GROUP BY sum_amount_transact.cardno_hash, sum_amount_transact.year;
但是,显示以下错误:
java.lang.Exception: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException Line 0:-1 Invalid column reference 'month'
以下子查询工作正常,并按预期返回结果:
SELECT cardno_hash, year, month, SUM(amount) AS sum_amount FROM credit_transact
GROUP BY cardno_hash, year, month
结果是:
card year month amount
a123 2016 12 23160
a123 2016 10 287
c123 2016 11 9709
如果有人可以帮助解决此问题,将非常感谢。
答案 0 :(得分:1)
我不太清楚你真正想要什么,但是我很确定你想要row_number()
。我想您想要每年最多的月份:
SELECT ct.*
FROM (SELECT cardno_hash, year, month, SUM(amount) AS sum_amount,
ROW_NUMBER() OVER (PARTITION BY cardno_hash, year ORDER BY SUM(amount) DESC) as seqnum
FROM credit_transact
GROUP BY cardno_hash, year, month
) ct
WHERE seqnum = 1;