SQL

时间:2016-02-02 05:54:33

标签: sql oracle difference window-functions oracle-analytics

我在SQL中的表格如下: -

RN   Name   value1  value2  Timestamp
1    Mark   110     210     20160119
1    Mark   106     205     20160115
1    Mark   103     201     20160112
2    Steve  120     220     20151218
2    Steve  111     210     20151210
2    Steve  104     206     20151203

期望的输出: -

RN  Name    value1Lag1 value1lag2   value2lag1  value2lag2
1   Mark       4             3            5        4
2   Steve      9             7            10       4

差异是从最近的第二次到最近的第二次计算,然后从最近的第二次到最近的第三次为RN 1计算

value1lag1 = 110-106 = 4

value1lag2 = 106-103 = 3

value2lag1 = 210-205 = 5

value2lag2 = 205-201 = 4

同样适用于其他RN。

注意:对于每个RN,只有3行,只有3行。

我在几个方面尝试过从类似职位获得帮助但没有运气。

4 个答案:

答案 0 :(得分:1)

我假设RN和姓名在这里链接。它有点乱,但是如果每个RN总是有3个值,而你总是想按照这个顺序检查它们,那么这样的东西应该有效。

AlignedDimension

答案 1 :(得分:1)

此处还有其他答案,但我认为您的问题是analytic functions,特别是LAG()

select
    rn,
    name,
    -- calculate the differences
    value1 - v1l1 value1lag1,
    v1l1 - v1l2 value1lag2,
    value2 - v2l1 value2lag1,
    v2l1 - v2l2 value2lag2
 from (
     select 
       rn, 
       name, 
       value1, 
       value2, 
       timestamp, 
       -- these two are the values from the row before this one ordered by timestamp (ascending)
       lag(value1) over(partition by rn, name order by timestamp asc) v1l1,
       lag(value2) over(partition by rn, name order by timestamp asc) v2l1
       -- these two are the values from two rows before this one ordered by timestamp (ascending)
       lag(value1, 2) over(partition by rn, name order by timestamp asc) v1l2,
       lag(value2, 2) over(partition by rn, name order by timestamp asc) v2l2

    from (
      select
      1 rn, 'Mark' name, 110 value1, 210 value2, '20160119' timestamp
      from dual
      union all
      select
      1 rn, 'Mark' name, 106 value1, 205 value2, '20160115' timestamp
      from dual
      union all
      select
      1 rn, 'Mark' name, 103 value1, 201 value2, '20160112' timestamp
      from dual
      union all
      select
      2 rn, 'Steve' name, 120 value1, 220 value2, '20151218' timestamp
      from dual
      union all
      select
      2 rn, 'Steve' name, 111 value1, 210 value2, '20151210' timestamp
      from dual
      union all
      select
      2 rn, 'Steve' name, 104 value1, 206 value2, '20151203' timestamp
      from dual
    ) data
)
where 
-- return only the rows that have defined values
v1l1 is not null and 
v1l2 is not null and
v2l1 is not null and 
v2l1 is not null

这种方法的好处是Oracle在内部完成所有必要的缓冲,避免了自连接等。对于大数据集,从性能角度来看,这很重要。

例如,该查询的解释计划类似于

-------------------------------------------------------------------------
| Id  | Operation        | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT |      |     6 |   150 |    13   (8)| 00:00:01 |
|*  1 |  VIEW            |      |     6 |   150 |    13   (8)| 00:00:01 |
|   2 |   WINDOW SORT    |      |     6 |   138 |    13   (8)| 00:00:01 |
|   3 |    VIEW          |      |     6 |   138 |    12   (0)| 00:00:01 |
|   4 |     UNION-ALL    |      |       |       |            |          |
|   5 |      FAST DUAL   |      |     1 |       |     2   (0)| 00:00:01 |
|   6 |      FAST DUAL   |      |     1 |       |     2   (0)| 00:00:01 |
|   7 |      FAST DUAL   |      |     1 |       |     2   (0)| 00:00:01 |
|   8 |      FAST DUAL   |      |     1 |       |     2   (0)| 00:00:01 |
|   9 |      FAST DUAL   |      |     1 |       |     2   (0)| 00:00:01 |
|  10 |      FAST DUAL   |      |     1 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("V1L1" IS NOT NULL AND "V1L2" IS NOT NULL AND "V2L1" IS 

请注意,没有连接,只有一个WINDOW SORT可以缓冲来自"数据源的必要数据" (在我们的例子中,VIEW 3是我们SELECT ... FROM DUAL的UNION ALL)来划分和计算不同的滞后。

答案 2 :(得分:0)

如果只是在这种情况下,它并不那么困难。你需要2个步骤

  1. 自我加入并获得减号

    的结果
    select t1.RN,
           t1.Name,
           t1.rm,
           t2.value1-t1.value1 as value1, 
           t2.value2-t1.value2 as value2
    from 
    (select RN,Name,value1,value2,
            row_number(partition by Name order by Timestamp desc) as rm from table)t1 
    left join
    (select RN,Name,value1,value2,
            row_number(partition by Name order by Timestamp desc) as rm from table) t2 
    on t1.rm = t2.rm-1
    where t2.RN is not null.
    
  2. 你把它设置为一个表格,比如table3。

    2.你转动它

    select * from (
      select t3.RN, t3.Name,t3.rm,t3.value1,t3.value2 from table3 t3
                   )
    pivot 
     (
       max(value1)
        for rm in ('1','2')
      )v1
    

    3.您为value1获取2个数据透视表,并将value2连接在一起以获得结果。

    但我认为可能有更好的方法,我不确定我们是否可以在我们转动时加入枢轴,所以我会在获得枢轴结果之后使用连接,这将产生2个表。它不好,但我能做的最好

答案 3 :(得分:0)

-- test data
with data(rn,
name,
value1,
value2,
timestamp) as
 (select 1, 'Mark', 110, 210, to_date('20160119', 'YYYYMMDD')
    from dual
  union all
  select 1, 'Mark', 106, 205, to_date('20160115', 'YYYYMMDD')
    from dual
  union all
  select 1, 'Mark', 103, 201, to_date('20160112', 'YYYYMMDD')
    from dual
  union all
  select 2, 'Steve', 120, 220, to_date('20151218', 'YYYYMMDD')
    from dual
  union all
  select 2, 'Steve', 111, 210, to_date('20151210', 'YYYYMMDD')
    from dual
  union all
  select 2, 'Steve', 104, 206, to_date('20151203', 'YYYYMMDD') from dual),

-- first transform value1, value2 to value_id (1,2), value
data2 as
 (select d.rn, d.name, 1 as val_id, d.value1 as value, d.timestamp
    from data d
  union all
  select d.rn, d.name, 2 as val_id, d.value2 as value, d.timestamp
    from data d)

select *  -- find previous row P of row D, evaluate difference and build column name as desired
  from (select d.rn,
               d.name,
               d.value - p.value as value,
               'value' || d.val_id || 'Lag' || row_number() over(partition by d.rn, d.val_id order by d.timestamp desc) as col
          from data2 p, data2 d
         where p.rn = d.rn
           and p.val_id = d.val_id
           and p.timestamp =
               (select max(pp.timestamp)
                  from data2 pp
                 where pp.rn = p.rn
                   and pp.val_id = p.val_id
                   and pp.timestamp < d.timestamp))
       -- pivot
       pivot(sum(value) for col in('value1Lag1',
                                   'value1Lag2',
                                   'value2Lag1',
                                   'value2Lag2'));