我有一个关于操作符优先级的SQL查询,
SELECT
foo,
count(*)
FROM
A
JOIN (SELECT
SUM(IF(bar = 2,1,0)) as bar_sum,
SUM(IF(foo >= 1,1,0)) as foo,
SUM(1) as sum_1
FROM
B
) as sums
GROUP BY
id,
bar_sum,
foo,
sum_1
ON A.id = B.id
外部括号中的GROUP BY
真的适用于内部括号吗?
注意,我需要将此SQL从Hive移植到Spark scala Dataframe API,所以我确实需要正确设置操作符优先级。从What is the execution sequence of Group By, Having and Where clause in SQL Server?看来,这通常是正确的,但我没有找到有关()
的任何文档。
答案 0 :(得分:1)
此查询看起来不正确(缺少ON
子句)
SELECT foo,count(*)
FROM A
JOIN (SELECT SUM(IF(bar = 2,1,0)) as bar_sum,
SUM(IF(foo >= 1,1,0)) as foo,
SUM(1) as sum_1
FROM B) as sums -- should be `ON`
GROUP BY id, bar_sum, foo, sum_1;
-- looks like grouping by sum_1, bar_sum is superflous
GROUP BY
仅适用于外部查询。请注意,内部查询将返回单行。
SELECT SUM(IF(bar = 2,1,0)) as bar_sum,
SUM(IF(foo >= 1,1,0)) as foo,
SUM(1) as sum_1
FROM B
-- single row
然后将单行连接到表A,并获得与不同的(id,foo)值一样多的行。