我已经看到很多关于这个一般错误的问题,但我不明白为什么会出现这个错误,可能是因为嵌套的窗口函数......
通过下面的查询,我得到了 Col_C
、Col_D
、...以及我尝试过的几乎所有内容的错误
SQL 编译错误:[eachColumn] 不是一个有效的 group by 表达式
SELECT
Col_A,
Col_B,
FIRST_VALUE(Col_C) IGNORE NULLS OVER (PARTITION BY Col_A, Col_B
ORDER BY Col_TimeStamp ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
MAX(Col_D) OVER (PARTITION BY Col_A, Col_B
ORDER BY Col_TimeStamp ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
FIRST_VALUE(CASE WHEN Col_T = 'testvalue'
THEN LAST_VALUE(Col_E) IGNORE NULLS OVER (PARTITION BY Col_A, Col_B
ORDER BY Col_TimeStamp DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
ELSE NULL END) IGNORE NULLS
OVER (PARTITION BY Col_A, Col_B
ORDER BY Col_TimeStamp ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM mytable
那么,是否有一种在 Snowflake 中使用嵌套窗口函数的方法(使用 case when ...
)?如果是这样,我怎么/我做错了什么?
答案 0 :(得分:2)
因此解构您的逻辑以表明它是导致问题的第二个 FIRST_VALUE
WITH data(Col_A,Col_B,Col_c,col_d, Col_TimeStamp, col_t,col_e) AS (
SELECT * FROM VALUES
(1,1,1,1,1,'testvalue',10),
(1,1,2,3,2,'value',11)
)
SELECT
Col_A,
Col_B,
FIRST_VALUE(Col_C) IGNORE NULLS OVER (PARTITION BY Col_A, Col_B
ORDER BY Col_TimeStamp ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as first_c,
MAX(Col_D) OVER (PARTITION BY Col_A, Col_B
ORDER BY Col_TimeStamp ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
LAST_VALUE(Col_E) IGNORE NULLS OVER (PARTITION BY Col_A, Col_B
ORDER BY Col_TimeStamp DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as last_e,
IFF(Col_T = 'testvalue', last_e, NULL) as if_test_last_e
/*,FIRST_VALUE(if_test_last_e) IGNORE NULLS OVER (PARTITION BY Col_A, Col_B
ORDER BY Col_TimeStamp ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as the_problem*/
FROM data
ORDER BY Col_A,Col_B, col_timestamp
;
如果我们取消对 the_problem
的注释,我们就拥有了.. 与 PostgreSQL(我的背景)相比,仅仅重用这么多先前的结果/步骤是一种礼物,所以在这里我只是破坏了另一个 SELECT 层。
WITH data(Col_A,Col_B,Col_c,col_d, Col_TimeStamp, col_t,col_e) AS (
SELECT * FROM VALUES
(1,1,1,1,1,'testvalue',10),
(1,1,2,3,2,'value',11)
)
SELECT *,
FIRST_VALUE(if_test_last_e) IGNORE NULLS OVER (PARTITION BY Col_A, Col_B
ORDER BY Col_TimeStamp ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as not_a_problem
FROM (
SELECT
Col_A,
Col_B,
FIRST_VALUE(Col_C) IGNORE NULLS OVER (PARTITION BY Col_A, Col_B
ORDER BY Col_TimeStamp ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as first_c,
MAX(Col_D) OVER (PARTITION BY Col_A, Col_B
ORDER BY Col_TimeStamp ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
LAST_VALUE(Col_E) IGNORE NULLS OVER (PARTITION BY Col_A, Col_B
ORDER BY Col_TimeStamp DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as last_e,
IFF(Col_T = 'testvalue', last_e, NULL) as if_test_last_e
,Col_TimeStamp
FROM data
)
ORDER BY Col_A,Col_B, Col_TimeStamp
然后一切正常。如果您 LAG 然后 IFF/FIRST_VALUE 然后 LAG 第二个结果,也会发生这种情况。
答案 1 :(得分:1)
“我已经看到很多关于这个一般错误的问题,但我不明白为什么会出现这个错误,可能是因为嵌套的窗口函数......”
Snowflake 支持在同一级别重用表达式(有时称为 "lateral column alias reference" )
写起来完全没问题:
if(snapshot.data().containsKey("lastAccess")){
}
else{
}
在标准 SQL 中,您将不得不使用多级查询 (cte) 或使用 LATERAL JOIN。相关:PostgreSQL: using a calculated column in the same query
不幸的是,相同的语法不适用于分析函数(我现在知道任何支持它的 RDMBS):
SELECT 1+1 AS col1,
col1 *2 AS col2,
CASE WHEN col1 > col2 THEN 'Y' ELSE 'NO' AS col3
...
在 SQL Standard 2016 中有一个可选特性:T619 嵌套窗口函数。
这里有一篇文章介绍了嵌套分析函数查询的样子:Nested window functions in SQL。
这意味着当前嵌套窗口函数的方法是使用派生表/cte:
SELECT ROW_NUMBER() OVER(PARTITION BY ... ORDER BY ...) AS rn
,MAX(rn) OVER(PARTITION BY <different than prev) AS m
FROM tab;