SQL挑战/难题:给定堆栈跟踪 - 如何在每个时间点找到顶部元素?

时间:2016-10-08 19:27:07

标签: sql sql-server oracle hive teradata

  • 我的真实用例是merge nested ranges。 我画了一些草图然后我看到范围开始和结束之间的相似性来叠加PUSH和POP操作。我明白解决这个问题也会解决原来的问题。
  • 实际上可以从问题中删除操作列。当val为NULL时,它是POP操作,否则它是PUSH操作。

拼图

stack_trace 包含以下列:

  • i - 表示某个时间点的整数值。
  • op - 2可能的操作:(“in”/“push”)和 O (“out”/“pop”)。
  • val - “in”/“push”操作插入的值或“out”/“pop”操作的NULL。

    目标是在每个时间点( i )找到堆栈顶部的值。

e.g。

(NULL值在此表示为空格)

数据:

i   op  val 
--  --  --  
1   I   A   
2   I   B   
3   O
4   I   C
5   O    
6   O   

必填结果:

i   top_of_stack_val
--  ----------------
1   A
2   B
3   A
4   C
5   A
6   

要求

  • 解决方案应该是单个SQL查询(子查询很好)。
  • 仅允许以下条款: SELECT FROM WHERE GROUP BY HAVING ORDER BY
  • 不允许使用 WITH 子句(CTE - 公用表格表达式)
  • 不允许使用T-SQL,PL / SQL等
  • 不允许使用UDF(用户自定义函数)
  • 不允许使用变量

样本数据

create table stack_trace
(
    i       int
   ,op      char(1)
   ,val     char(1)
)
;

insert into stack_trace (i,op,val) values (1,'I','A');
insert into stack_trace (i,op,val) values (2,'I','B');
insert into stack_trace (i,op,val) values (3,'I','C');
insert into stack_trace (i,op,val) values (4,'I','D');
insert into stack_trace (i,op,val) values (5,'I','E');
insert into stack_trace (i,op)     values (6,'O');
insert into stack_trace (i,op)     values (7,'O');
insert into stack_trace (i,op)     values (8,'O');
insert into stack_trace (i,op,val) values (9,'I','F');
insert into stack_trace (i,op)     values (10,'O');
insert into stack_trace (i,op,val) values (11,'I','G');
insert into stack_trace (i,op,val) values (12,'I','H');
insert into stack_trace (i,op)     values (13,'O');
insert into stack_trace (i,op)     values (14,'O');
insert into stack_trace (i,op,val) values (15,'I','I');
insert into stack_trace (i,op,val) values (16,'I','J');
insert into stack_trace (i,op,val) values (17,'I','K');
insert into stack_trace (i,op,val) values (18,'I','L');
insert into stack_trace (i,op,val) values (19,'I','M');
insert into stack_trace (i,op)     values (20,'O');
insert into stack_trace (i,op,val) values (21,'I','N');
insert into stack_trace (i,op)     values (22,'O');
insert into stack_trace (i,op,val) values (23,'I','O');
insert into stack_trace (i,op)     values (24,'O');
insert into stack_trace (i,op,val) values (25,'I','P');
insert into stack_trace (i,op)     values (26,'O');
insert into stack_trace (i,op)     values (27,'O');
insert into stack_trace (i,op,val) values (28,'I','Q');
insert into stack_trace (i,op,val) values (29,'I','R');
insert into stack_trace (i,op)     values (30,'O');
insert into stack_trace (i,op)     values (31,'O');
insert into stack_trace (i,op)     values (32,'O');
insert into stack_trace (i,op)     values (33,'O');
insert into stack_trace (i,op)     values (34,'O');
insert into stack_trace (i,op)     values (35,'O');
insert into stack_trace (i,op,val) values (36,'I','S');
insert into stack_trace (i,op)     values (37,'O');
insert into stack_trace (i,op)     values (38,'O');
insert into stack_trace (i,op,val) values (39,'I','T');
insert into stack_trace (i,op,val) values (40,'I','U');
insert into stack_trace (i,op)     values (41,'O');
insert into stack_trace (i,op,val) values (42,'I','V');
insert into stack_trace (i,op,val) values (43,'I','W');
insert into stack_trace (i,op,val) values (44,'I','X');
insert into stack_trace (i,op)     values (45,'O');
insert into stack_trace (i,op)     values (46,'O');
insert into stack_trace (i,op,val) values (47,'I','Y');
insert into stack_trace (i,op)     values (48,'O');
insert into stack_trace (i,op)     values (49,'O');
insert into stack_trace (i,op,val) values (50,'I','Z');
insert into stack_trace (i,op)     values (51,'O');
insert into stack_trace (i,op)     values (52,'O');

必填结果

i   top_of_stack_val
--  ----------------
1   A
2   B
3   C
4   D
5   E
6   D
7   C
8   B
9   F
10  B
11  G
12  H
13  G
14  B
15  I
16  J
17  K
18  L
19  M
20  L
21  N
22  L
23  O
24  L
25  P
26  L
27  K
28  Q
29  R
30  Q
31  K
32  J
33  I
34  B
35  A
36  S
37  A
38  
39  T
40  U
41  T
42  V
43  W
44  X
45  W
46  V
47  Y
48  V
49  T
50  Z
51  T
52  

5 个答案:

答案 0 :(得分:5)

这是一个很好的谜题。

由于我的主要DBMS是Teradata,我使用分析功能(需要TD14.10 +)为它编写了一个解决方案:

SELECT dt.*,
   -- find the last item in the stack with the same position
   Last_Value(val IGNORE NULLS)
   Over (PARTITION BY pos
         ORDER BY i) AS top_of_stack_val
FROM 
 ( 
   SELECT st.*,
      -- calculate the number of items in the stack
      Sum(CASE WHEN op = 'I' THEN 1 ELSE -1 end) 
      Over (ORDER BY i
            ROWS Unbounded Preceding) AS pos
   FROM stack_trace AS st
 ) AS dt;

此解决方案也适用于Oracle,但PostgreSQL& SQL Server不支持IGNORE NULLS的{​​{1}}选项,并且模拟它非常复杂,例如参见Itzk Ben-Gan的The Last non NULL Puzzle

编辑:事实上它并不复杂,我忘记了Itzik的第二个解决方案,旧的背驮式技巧;-)

Martin Smith的方法适用于所有四个DBMS。

答案 1 :(得分:3)

我个人怀疑你最终会找到可以在所有SQL Server,Teradata,Postgres和Oracle中使用的SQL,并且如果表格非常大,那么它具有可接受的性能。

SQL Server解决方案(demo)如下

SELECT i,
       SUBSTRING(MAX(FORMAT(i, 'D10') + val) OVER (PARTITION BY Pos ORDER BY i 
                                                     ROWS UNBOUNDED PRECEDING), 11, 8000)
FROM   (SELECT st.*,
               sum(CASE WHEN op = 'I' THEN 1 ELSE -1 END) 
                  OVER (ORDER BY i ROWS UNBOUNDED PRECEDING) AS pos
        FROM   stack_trace st) t1
ORDER  BY i; 

答案 2 :(得分:3)

虽然确实需要额外的步骤 -

Hive Teradata Oracle SQL Server PostgreSQL

select      s.i

           ,min (s.val) over
            (
                partition by    s.depth
                               ,s.depth_val_seq
            )                                   as top_of_stack_val            

from       (select      s.i
                       ,s.val
                       ,s.depth

                       ,count (s.val) over 
                        (
                            partition by    s.depth
                            order by        s.i
                            rows            between unbounded preceding and current row
                        )                                                                   as depth_val_seq

            from       (select      s.i
                                   ,s.val

                                   ,sum (case s.op when 'I' then 1 else -1 end)   over
                                    (
                                        order by        s.i
                                        rows            between unbounded preceding and current row
                                    )                                                                   as depth

                        from        stack_trace s
                        )
                        s
            )
            s

order by    i
;

答案 3 :(得分:1)

这实际上是一个有趣的问题。我要做的是跟踪堆栈中的每个元素位置。您可以使用累积总和来执行此操作:

select st.*,
       sum(case when op = 'I' then 1 else -1 end) over (order by i) as pos
from stack_trace st;

唉,此时,我认为你需要一个相当复杂的连接或子查询来找出pos所指的最新值。这是一种方法:

with st as (
      select st.*,
             sum(case when op = 'I' then 1 else -1 end) over (order by i) as pos
      from stack_trace st
     )
select st.*,
       (select val
        from st st2
        where st2.i <= st.id and st2.pos = st.pos and
              st2.val is not null
        order by i desc
        fetch first 1 row only
       ) as top_of_stack_val
from st;

答案 4 :(得分:0)

的Teradata

select      s.i

           ,first_value (s.val) over 
            (
                partition by    s.depth
                order by        s.i
                reset when      s.op = 'I'
            )                                as top_of_stack_val

from       (select      s.i
                       ,s.val
                       ,s.op

                       ,sum (case s.op when 'I' then 1 else -1 end)   over
                        (
                            order by        s.i
                            rows            unbounded preceding
                        )                                                   as depth

            from        stack_trace s
            )
            s

order by    i
;