查询雪花中的小于/大于值

时间:2021-07-23 10:32:49

标签: sql snowflake-cloud-data-platform

假设在 Snowflake 中有一个 SQL 表,其中所有列都是 int 类型(示例表在帖子末尾)

我需要一个 SQL 查询来返回小于输入值之和的结果。 为清楚起见,这里是按列 x 过滤的查询:

select 
    max(cumsum.cumulative),
    xlimit.val
from (select 
        source.x,
        sum(source.y) over(order by source.x asc range between unbounded preceding and current row) as cumulative 
      from sampletable source) as cumsum
        join (select * from values (3), (5), (10)) as xlimit(val)
            on cumsum.x < xlimit.val
    group by xlimit.val;

上述查询的目标是:

“返回列 y 的总和,其中 列 x 相应地小于 3、5 和 10

查询输入参数是可以变化的 3、5 和 10

但如果我扩展此查询以按列 z 大于该值进行过滤,则查询将返回无效值

select 
    max(cumsum.cumulative),
    xlimit.val,
    zlimit.val
from (select
        source.x,
        source.z,
        sum(source.y) over(order by source.x asc, source.z desc range between unbounded preceding and current row) as cumulative 
      from sampletable source) as cumsum
        join (select * from values (3), (5), (10)) as xlimit(val)
            on cumsum.x < xlimit.val
        join (select * from values (100), (200), (800)) as zlimit(val)
            on cumsum.z > zlimit.val
    group by xlimit.val, zlimit.val;

上述查询的目标是:

“返回列y的总和,其中列x小于3、5和10列z大于100、200、800< /strong> 相应地"

问题是,如果 x < 3 和 z > 100,预期结果将是 210,但查询返回 300

实际结果:

<头>
MAX(CUMSUM.CUMULATIVE) x z
300 3 100
700 5 100
1770 10 100
1770 10 500
1770 10 800

预期结果:

<头>
MAX(CUMSUM.CUMULATIVE) x z
210 3 100
610 5 100
1680 10 100
960 10 500
100 10 800

谁能帮我看看我做错了什么?

示例表:

<头>
x y z
0 00 000
0 10 000
0 20 100
0 30 100
1 10 100
1 20 100
1 30 200
1 40 200
2 20 200
2 30 200
2 40 300
2 50 300
3 30 300
3 40 300
3 50 400
3 60 400
4 40 400
4 50 400
4 60 500
4 70 500
5 50 500
5 60 500
5 70 600
5 80 600
6 60 600
6 70 600
6 80 700
6 90 800
7 70 700
7 80 700
7 90 800
7 00 800
8 80 800
8 90 800
8 00 900
8 10 900
9 90 900
9 00 900
9 10 000
9 20 000

完整的工作示例:

drop table if exists testtable;
create table testtable(
  x int,
  y int,
  z int
);

insert into testtable values
  (0, 00, 000),
  (0, 10, 000),
  (0, 20, 100),
  (0, 30, 100),
  (1, 10, 100),
  (1, 20, 100),
  (1, 30, 200),
  (1, 40, 200),
  (2, 20, 200),
  (2, 30, 200),
  (2, 40, 300),
  (2, 50, 300),
  (3, 30, 300),
  (3, 40, 300),
  (3, 50, 400),
  (3, 60, 400),
  (4, 40, 400),
  (4, 50, 400),
  (4, 60, 500),
  (4, 70, 500),
  (5, 50, 500),
  (5, 60, 500),
  (5, 70, 600),
  (5, 80, 600),
  (6, 60, 600),
  (6, 70, 600),
  (6, 80, 700),
  (6, 90, 800),
  (7, 70, 700),
  (7, 80, 700),
  (7, 90, 800),
  (7, 00, 800),
  (8, 80, 800),
  (8, 90, 800),
  (8, 00, 900),
  (8, 10, 900),
  (9, 90, 900),
  (9, 00, 900),
  (9, 10, 000),
  (9, 20, 000);

select 
    max(cumsum.cumulative),
    xmlimit.val xmlimit,
    zmlimit.val zmlimit
from (select source.x, source.z, sum(source.y) over(order by source.x asc, source.z desc range between unbounded preceding and current row) as cumulative from testtable source) as cumsum
        join (select * from values (3), (5), (10)) as xmlimit(val)
            on cumsum.x < xmlimit.val
        join (select * from values (100), (500), (800)) as zmlimit(val)
            on cumsum.z > zmlimit.val
    group by xmlimit.val, zmlimit.val;

1 个答案:

答案 0 :(得分:1)

你想要这样的东西吗?

--------------
WITH xmlimit (val) AS (
        select * from (values (3), (5), (10)) AS x
     )
   , zmlimit (val) AS (
        select * from (values (100), (500), (800)) AS x
     )
select SUM(y)
     , xmlimit.val xmlimit
     , zmlimit.val zmlimit
  from testtable AS cumsum
  JOIN xmlimit
    ON cumsum.x < xmlimit.val
  JOIN zmlimit
    ON cumsum.z > zmlimit.val
 GROUP BY xmlimit.val, zmlimit.val
--------------

+--------+---------+---------+
| SUM(y) | xmlimit | zmlimit |
+--------+---------+---------+
|    210 |       3 |     100 |
|    610 |       5 |     100 |
|   1680 |      10 |     100 |
|    960 |      10 |     500 |
|    100 |      10 |     800 |
+--------+---------+---------+

我调整了您的 SQL 以使用我正在使用的引擎。 如果你喜欢这个结果,只需删除你正在运行的 SUM 窗口函数,直接在 GROUP BY x, z 的上下文中执行你的 SUM;如果您的 SQL 有效,请像这样(大致)调整它:

select 
    SUM(cumsum.y),
    xlimit.val,
    zlimit.val
from sampletable as cumsum
        join (select * from values (3), (5), (10)) as xlimit(val)
            on cumsum.x < xlimit.val
        join (select * from values (100), (200), (800)) as zlimit(val)
            on cumsum.z > zlimit.val
    group by xlimit.val, zlimit.val;