配置单元:错误计算总和然后分组项目的最大值

时间:2018-07-11 11:37:01

标签: sql hive hiveql

我想运行一个查询,该查询计算每张信用卡每个月的最大花费。对于每张信用卡,我将需要计算每月花费的总金额。我有一个表格,其中包含信用卡credit_transact的交易:

processdate timestamp   ""
cardno_hash string  ""
amount  int ""
year    int ""
month   int ""

弥补样本数据:

card    year    month    amount
a123    2016    12       23160
a123    2016    10       287
c123    2016    11       5503
c123    2016    11       4206

我想要:

card    year    month    amount
a123    2016    12       23160
c123    2016    11       9709

重要的是年份和月份是分区列。

我尝试了如下子查询:

USE credit_card_db;
SELECT sum_amount_transact.cardno_hash, sum_amount_transact.year, sum_amount_transact.month, MAX(sum_amount_transact.sum_amount)
FROM
(
  SELECT cardno_hash, year, month, SUM(amount) AS sum_amount FROM credit_transact
  GROUP BY cardno_hash, year, month
) AS sum_amount_transact
GROUP BY sum_amount_transact.cardno_hash, sum_amount_transact.year;

但是,显示以下错误:

java.lang.Exception: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException Line 0:-1 Invalid column reference 'month'

以下子查询工作正常,并按预期返回结果:

SELECT cardno_hash, year, month, SUM(amount) AS sum_amount FROM credit_transact
  GROUP BY cardno_hash, year, month

结果是:

card    year    month    amount
a123    2016    12       23160
a123    2016    10       287
c123    2016    11       9709

如果有人可以帮助解决此问题,将非常感谢。

1 个答案:

答案 0 :(得分:1)

我不太清楚你真正想要什么,但是我很确定你想要row_number()。我想您想要每年最多的月份:

SELECT ct.*
FROM (SELECT cardno_hash, year, month, SUM(amount) AS sum_amount,
             ROW_NUMBER() OVER (PARTITION BY cardno_hash, year ORDER BY SUM(amount) DESC) as seqnum
      FROM credit_transact
      GROUP BY cardno_hash, year, month
     ) ct
WHERE seqnum = 1;