SELECT
*
FROM(
SELECT
imps.org_name,
imps.org_id,
imps.adv_name,
imps.adv_id,
imps.mc,
Rank() over (partition by imps.org_id ORDER by imps.mc desc) as Rank
FROM(
SELECT
org_name,
org_id,
adv_name,
adv_id,
sum(cost/1000) as mc,
FROM
table1
WHERE
org_id in (12345, 54321)
AND
date
BETWEEN
'2016-09-10'
AND
'2016-11-01'
GROUP BY
adv_id,
org_name,
org_id,
adv_name) imps
GROUP BY
imps.org_name,
imps.org_id,
imps.adv_name,
imps.adv_id) r
WHERE r.Rank <= 5;
运行此查询时,我收到错误
FAILED: SemanticException Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies.
Underlying error: org.apache.hadoop.hive.ql.parse.SemanticException: Line 10:65 Invalid column reference 'mc'
由于该列已明确定义,因此不确定它为什么会抛出错误。 我已经尝试了sum(imps.mc),这似乎有效,但我很难将总和放在rank函数中,因为它看起来效率不高。
总体问题:有更好的方法进行此排名吗?
答案 0 :(得分:1)
这就是我想要尝试的:
with
语法)在IMPS生成完成之前是否执行排名并将其作为CTE,我们是否会消除这种可能性?我是根本原因的忠实粉丝所以我会通过添加组添加并查看它是否仍然有效,如果没有,我们可能有罪魁祸首但不明白为什么。
如果我们添加组并且它仍然有效,那么我们可能会通过在执行窗口函数之前强制引擎生成imps来解决执行问题的顺序。
。
With imps as (
SELECT
org_name,
org_id,
adv_name,
adv_id,
sum(cost/1000) as mc
FROM
table1
WHERE
org_id in (12345, 54321)
AND
date
BETWEEN
'2016-09-10'
AND
'2016-11-01'
GROUP BY
adv_id,
org_name,
org_id,
adv_name)
SELECT
*
FROM(
SELECT
imps.org_name,
imps.org_id,
imps.adv_name,
imps.adv_id,
imps.mc,
Rank() over (partition by imps.org_id ORDER by imps.mc desc) as Rank
FROM IMPS) r
WHERE r.Rank <= 5;
答案 1 :(得分:1)
SELECT *
FROM
(
SELECT
org_name,
org_id,
adv_name,
adv_id,
sum(cost/1000) as mc,
Rank() over (partition by org_id ORDER by sum(cost/1000) desc) as Rank
FROM
table1
WHERE
org_id in (12345, 54321)
AND date BETWEEN '2016-09-10' AND '2016-11-01'
GROUP BY
adv_id,
org_name,
org_id,
adv_name) r
WHERE r.Rank <= 5;
正如xQbert所说的那样“,”在内部选择最多的MC后,很可能是你提出的主要问题。但是你也可以在内部最多选择中执行RANK()并消除另一个嵌套查询。此外,您实际上并未在第二个查询中聚合任何内容,因此您可以按标准消除第二组组。