我有以下数据:
SELECT
mtrans.merch_num,
mtrans.card_num
FROM a_sbp_db.merch_trans_daily mtrans
INNER JOIN a_sbp_db.product_holding ph ON mtrans.card_num = ph.acc_num
INNER JOIN a_sbp_db.cust_demo cdemo ON cdemo.cust_id = ph.cust_id
WHERE mtrans.transaction_date LIKE '2017-09%' AND person_org_code='P' AND ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) < 30;
+-----------+----------------------------+
| merch_num | card_num |
+-----------+----------------------------+
| 1 | 4658XXXXXXXXXXXXXXXXXXURMX |
| 2 | 4658XXXXXXXXXXXXXXXXXXIE6X |
| 2 | 4658XXXXXXXXXXXXXXXXXXDA8X |
| 2 | 4658XXXXXXXXXXXXXXXXXX7D1X |
| 2 | 4658XXXXXXXXXXXXXXXXXXTJ2X |
| 2 | 4658XXXXXXXXXXXXXXXXXXQQWX |
| 2 | 4659XXXXXXXXXXXXXXXXXXY4EX |
| 2 | 4658XXXXXXXXXXXXXXXXXXRDOX |
| 2 | 4658XXXXXXXXXXXXXXXXXX0O3X |
| 2 | 4658XXXXXXXXXXXXXXXXXXNVBX |
+-----------+----------------------------+
我想通过merch_num聚合trans_amt,只要我得到的唯一card_num超过1。
在简单查询中,我可以这样做:
SELECT
mtrans.merch_num,
FROM_UNIXTIME(UNIX_TIMESTAMP(),'MMM-yyyy') AS process_month,
SUM(mtrans.trans_amt) AS total_age_less_30_1
FROM a_sbp_db.merch_trans_daily mtrans
INNER JOIN a_sbp_db.product_holding ph ON mtrans.card_num = ph.acc_num
INNER JOIN a_sbp_db.cust_demo cdemo ON cdemo.cust_id = ph.cust_id
WHERE mtrans.transaction_date LIKE '2017-09%' AND person_org_code='P' AND ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) < 30
GROUP BY
mtrans.merch_num having count(distinct mtrans.card_num) > 1;
+-----------+---------------+---------------------+
| merch_num | process_month | total_age_less_30_1 |
+-----------+---------------+---------------------+
| 2 | Nov-2017 | 2147.5 |
+-----------+---------------+---------------------+
在这里,我可以跳过商家 - 5493036,因为它没有超过1的唯一卡片。
但我有多种条件,其中&amp;想只写1个查询。 使用case语句我可以像下面这样做:
SELECT mtrans.merch_num,
FROM_UNIXTIME(UNIX_TIMESTAMP(),'MMM-yyyy') AS process_month,
NVL(SUM(CASE
WHEN (ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) < 30)
THEN mtrans.trans_amt ELSE 0 END), NULL)
AS total_age_less_30_1,
NVL(SUM(CASE
WHEN (ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) >= 30
AND ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) < 40)
THEN mtrans.trans_amt ELSE 0 END), NULL)
AS total_age_30_40_1
FROM a_sbp_db.merch_trans_daily mtrans
INNER JOIN a_sbp_db.product_holding ph ON mtrans.card_num = ph.acc_num
INNER JOIN a_sbp_db.cust_demo cdemo ON cdemo.cust_id = ph.cust_id
WHERE mtrans.transaction_date LIKE '2017-09%'
AND person_org_code='P'
GROUP BY
mtrans.merch_num
+-----------+---------------+---------------------+-------------------+
| merch_num | process_month | total_age_less_30_1 | total_age_30_40_1 |
+-----------+---------------+---------------------+-------------------+
| 3 | Nov-2017 | 0 | 0 |
| 4 | Nov-2017 | 0 | 0 |
| 1 | Nov-2017 | 2.49 | 203.68 |
| 2 | Nov-2017 | 2147.5 | 4907 |
| 5 | Nov-2017 | 0 | 0 |
+-----------+---------------+---------------------+-------------------+
我想将2.49作为该商家的NULL,超过1张唯一的卡片不存在。
我无法申请条件来检查唯一卡号是否超过1只有我必须显示总和(trans_amt)
当我在case语句中申请和条件时,我得到以下错误:
SELECT
mtrans.merch_num,
FROM_UNIXTIME(UNIX_TIMESTAMP(),'MMM-yyyy') AS process_month,
NVL(SUM(CASE
WHEN (ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) < 30 and count(distinct mtrans.card_num) > 1)
THEN mtrans.trans_amt ELSE 0 END), NULL)
AS total_age_less_30_1,
NVL(SUM(CASE
WHEN (ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) >= 30
AND ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) < 40 and count(distinct mtrans.card_num) > 1)
THEN mtrans.trans_amt ELSE 0 END), NULL)
AS total_age_30_40_1
FROM a_sbp_db.merch_trans_daily mtrans
INNER JOIN a_sbp_db.product_holding ph ON mtrans.card_num = ph.acc_num
INNER JOIN a_sbp_db.cust_demo cdemo ON cdemo.cust_id = ph.cust_id
WHERE mtrans.transaction_date LIKE '2017-09%'
AND person_org_code='P'
GROUP BY
mtrans.merch_num;
ERROR: AnalysisException: aggregate function must not contain aggregate parameters: sum(CASE WHEN (round(datediff(mtrans.transaction_date, cdemo.date_birth) / 365) < 30 AND count(DISTINCT mtrans.card_num) > 1) THEN mtrans.trans_amt ELSE 0 END)
有人可以帮忙吗?
答案 0 :(得分:0)
错误似乎是因为你在SUM语句中有计数。这是你必须尝试的,让我知道它是怎么回事:
SELECT
mtrans.merch_num,
FROM_UNIXTIME(UNIX_TIMESTAMP(),'MMM-yyyy') AS process_month,
NVL(CASE
WHEN (ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) < 30 and count(distinct mtrans.card_num) > 1)
THEN SUM(mtrans.trans_amt) ELSE 0 END, NULL)
AS total_age_less_30_1,
NVL(CASE
WHEN (ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) >= 30
AND ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) < 40 and count(distinct mtrans.card_num) > 1)
THEN SUM(mtrans.trans_amt) ELSE 0 END, NULL)
AS total_age_30_40_1
FROM a_sbp_db.merch_trans_daily mtrans
INNER JOIN a_sbp_db.product_holding ph ON mtrans.card_num = ph.acc_num
INNER JOIN a_sbp_db.cust_demo cdemo ON cdemo.cust_id = ph.cust_id
WHERE mtrans.transaction_date LIKE '2017-09%'
AND person_org_code='P'
GROUP BY
mtrans.merch_num;
答案 1 :(得分:0)
我建议以更好的方式做到如下。
(PS:我没有任何hive访问权限,所以我使用常规SQL使用Postgresql进行此操作。因此,应该更容易适应Hive SQL。)
这是我在表格中插入的SQL表和记录。
CREATE TEMPORARY TABLE hivetest (
merchant_id INTEGER,
card_number TEXT,
customer_dob TIMESTAMP,
transaction_dt TIMESTAMP,
transaction_amt DECIMAL
);
INSERT INTO hivetest VALUES
(1, 'A', '1997-12-01', '2017-11-01', 10.0),
(2, 'A', '1997-12-01', '2017-11-01', 11.0),
(2, 'B', '1980-12-01', '2017-11-01', 12.0),
(3, 'A', '1997-12-01', '2017-11-01', 13.0),
(3, 'A', '1997-12-01', '2017-11-01', 14.0),
(4, 'A', '1997-12-01', '2017-11-01', 15.0),
(4, 'C', '1980-12-01', '2017-11-01', 16.0);
首先,您需要连接表并生成一个数据集,为您提供transaction_age (transaction_dt - customer_dob
)。我在这个单表中有大部分数据用于日期减法,但是简单的INNER JOIN应该足以实现这一点。无论如何,这里是相同的查询。
SELECT
merchant_id, card_number, DATE(customer_dob) customer_dob, DATE(transaction_dt) transaction_dt,
DATE_PART('year', DATE(transaction_dt)) - DATE_PART('year', DATE(customer_dob)) transaction_age,
transaction_amt
FROM hivetest ORDER BY 1;
这导致数据如下。
+-------------+-------------+--------------+----------------+-----------------+----------------+
| merchant_id | card_number | customer_dob | transaction_dt | transaction_age |transaction_amt |
+-------------+-------------+--------------+----------------+-----------------+----------------+
| 1 | A | 1997-12-01 | 2017-11-01 | 20 | 10.0 |
| 2 | A | 1997-12-01 | 2017-11-01 | 20 | 11.0 |
| 2 | B | 1980-12-01 | 2017-11-01 | 37 | 12.0 |
| 3 | A | 1997-12-01 | 2017-11-01 | 20 | 13.0 |
| 3 | A | 1997-12-01 | 2017-11-01 | 20 | 14.0 |
| 4 | A | 1997-12-01 | 2017-11-01 | 20 | 15.0 |
| 4 | C | 1980-12-01 | 2017-11-01 | 37 | 16.0 |
+-------------+-------------+--------------+----------------+-----------------+----------------+
上述数据集允许您根据需要对transaction_age
的交易金额进行分类。诀窍是在子查询中使用上述查询并使用此子查询的结果进行分类。以下是执行相同操作的查询。
SELECT
merchant_id,
-- Transaction Age less than 30
SUM(CASE WHEN transaction_age <= 30 THEN 1 ELSE 0 END) count_30,
SUM(CASE WHEN transaction_age <= 30 THEN transaction_amt ELSE 0 END) sum_30,
-- Transaction Age between 30 and 40
SUM(CASE WHEN transaction_age > 30 AND transaction_age <= 40 THEN 1 ELSE 0 END) case_30_40,
SUM(CASE WHEN transaction_age > 30 AND transaction_age <= 40 THEN transaction_amt ELSE 0 END) sum_30_40
FROM
(
SELECT
merchant_id, transaction_amt,
DATE_PART('year', DATE(transaction_dt)) - DATE_PART('year', DATE(customer_dob)) transaction_age
FROM hivetest
) m
GROUP BY merchant_id ORDER BY 1;
这导致分类输出如下所示,它为您提供每个商家的每个类别的交易数量和交易金额总和:
+-------------+----------+--------+------------+-----------+
| merchant_id | count_30 | sum_30 | case_30_40 | sum_30_40 |
+-------------+----------+--------+------------+-----------+
| 1 | 1 | 10.0 | 0 | 0 |
| 2 | 1 | 11.0 | 1 | 12.0 |
| 3 | 2 | 27.0 | 0 | 0 |
| 4 | 1 | 15.0 | 1 | 16.0 |
+-------------+----------+--------+------------+-----------+
现在,这是我们的数据集,它或多或少是最终结果。但是,根据您的要求,您只对拥有1张以上唯一卡片的商家感兴趣(COUNT(DISTINCT card_number) > 1
)。
所以,让我们写另一个查询给我们这个。以下是计算此信息的查询,并根据标准将标记标记为TRUE或FALSE,表示我们是否对该商家感兴趣。
SELECT
merchant_id,
CASE
WHEN COUNT(DISTINCT card_number) > 1 THEN
TRUE
ELSE
FALSE
END has_distinct_cards_gt_1
FROM hivetest GROUP BY merchant_id ORDER BY 1
这给出了如下输出。
+-------------+-------------------------+
| merchant_id | has_distinct_cards_gt_1 |
+-------------+-------------------------+
| 1 | false |
| 2 | true |
| 3 | false |
| 4 | true |
+-------------+-------------------------+
现在,我们差不多完成了。我们只需要连接这两个表,然后基于has_distinct_cards_gt_1
,从先前生成的数据集中相应地显示列。
这是生成的最终连接查询和结果集数据。
SELECT
merchants_all.merchant_id,
-- Age < 30
CASE
WHEN merchants_cards.has_distinct_cards_gt_1 THEN
sum_30
ELSE
0
END total_sum_30,
-- Age in 30 and 40
CASE
WHEN merchants_cards.has_distinct_cards_gt_1 THEN
sum_30_40
ELSE
0
END total_sum_30_40
FROM
(
SELECT
merchant_id,
SUM(CASE WHEN transaction_age <= 30 THEN transaction_amt ELSE 0 END) sum_30,
SUM(CASE WHEN transaction_age > 30 AND transaction_age <= 40 THEN transaction_amt ELSE 0 END) sum_30_40
FROM
(
SELECT merchant_id, DATE_PART('year', DATE(transaction_dt)) - DATE_PART('year', DATE(customer_dob)) transaction_age, transaction_amt
FROM hivetest
) m
GROUP BY merchant_id
) merchants_all
JOIN
(
SELECT merchant_id, CASE WHEN COUNT(DISTINCT card_number) > 1 THEN TRUE ELSE FALSE END has_distinct_cards_gt_1
FROM hivetest GROUP BY merchant_id ORDER BY 1
) merchants_cards
ON
(merchants_all.merchant_id = merchants_cards.merchant_id);
这会生成您需要的最终数据。
+-------------+--------------+-----------------+
| merchant_id | total_sum_30 | total_sum_30_40 |
+-------------+--------------+-----------------+
| 1 | 0 | 0 |
| 2 | 11.0 | 12.0 |
| 3 | 0 | 0 |
| 4 | 15.0 | 16.0 |
+-------------+--------------+-----------------+
如果有帮助,请告诉我。
答案 2 :(得分:0)
SUM中的COUNT是问题所在。 这是一个解决方案。我还没有测试过它。 表person_org_code属于哪个表格并不明显。如果它在merch_trans_daily中,那么添加person_org_code =&#39; P&#39;到视图中的where子句。让我们知道它是否有效!
WITH mtrans_count AS
(SELECT merch_num,
COUNT(1) AS cnt
FROM a_sbp_db.merch_trans_daily
WHERE mtrans.transaction_date LIKE '2017-09%'
)
SELECT mtrans.merch_num
,FROM_UNIXTIME(UNIX_TIMESTAMP(), 'MMM-yyyy') AS process_month
,NVL(SUM(CASE
WHEN (
ROUND(DATEDIFF(mtrans.transaction_date, cdemo.date_birth) / 365) < 30
AND mtrans_count.cnt > 1
)
THEN mtrans.trans_amt
ELSE 0
END), NULL) AS total_age_less_30_1
,NVL(SUM(CASE
WHEN (
ROUND(DATEDIFF(mtrans.transaction_date, cdemo.date_birth) / 365) >= 30
AND ROUND(DATEDIFF(mtrans.transaction_date, cdemo.date_birth) / 365) < 40
AND mtrans_count.cnt > 1
)
THEN mtrans.trans_amt
ELSE 0
END), NULL) AS total_age_30_40_1
FROM a_sbp_db.merch_trans_daily mtrans
INNER JOIN a_sbp_db.product_holding ph ON mtrans.card_num = ph.acc_num
INNER JOIN a_sbp_db.cust_demo cdemo ON cdemo.cust_id = ph.cust_id
INNER JOIN mtrans_count ON mtrans_count.merch_num = mtrans.merch_num
WHERE mtrans.transaction_date LIKE '2017-09%'
AND person_org_code = 'P'
GROUP BY mtrans.merch_num;