用于计算SQL Server

时间:2017-07-05 01:00:49

标签: python sql sql-server pandas

我想要以下的SQL查询。我是SQL的新手。下表只是我所拥有的数据类型的一个例子。我有大约3000万行的非常大的数据,并且想写一个查询来获得下面的输出表。

   Id        type        data          time
-----------------------------------------------------------
    1          30          3.9          15:50:10.660555
    1          30          4.0          15:50:10.660777
    1          70          11.5         15:50:10.797966
    1          30          4.1          15:50:10.834444
    1          70          12.6         15:50:10.853114
    1          70          16.7         15:50:10.955086
    1          30          5            15:50:10.99
    11         30          3.8          15:50:11.660555
    11         30          4.1          15:50:11.660777
    11         70          12.5         15:50:11.797966
    11         30          4.7          15:50:11.834444
    11         70          12.68        15:50:11.853114
    11         70          16.76        15:50:11.955086
    11         30          5.1          15:50:11.99

我有一张上面的表格。对于每种类型70,我需要计算具有最后已知类型30的内容。例如,对于Id = 1,对于第一种类型=在15:50:10.797966处的70数据,我需要在15处获得type = 30数据: 50:10.660777以便我可以计算结果= 11.5 / 4.0。同样,对于15:50:10.853114的type = 70,我想要15 = 50:10.834444的type = 30的数据,所以我的结果= 12.6 / 4.1。

我希望输出看起来像这样:

Id          type           result             time
------------------------------------------------------
1            70             11.5/4.0        15:50:10.797966
1            70             12.6/4.1        15:50:10.853114
1            70             16.7/4.1        15:50:10.955086
11           70             12.5/4.1        15:50:11.797966
11           70             12.68/4.7       15:50:11.853114
11           70             16.76/4.7       15:50:11.955086

我希望能够使用pyodbc在python中执行这些SQL查询。

任何帮助将不胜感激!在此先感谢!!

2 个答案:

答案 0 :(得分:1)

假设每个id至少有一个类型= 30行= type = 70,你可以使用outer apply执行此操作,在每个类型= 70行之前获取type = 30的max时间并使用分裂的价值。

SELECT x.id,
       x.type,
       x.time,
       x.data*1.0/t.data as result
FROM
  (SELECT t.*,t1.maxtime_before
   FROM t 
   OUTER APPLY
     (SELECT max(time) AS maxtime_before
      FROM t t1
      WHERE t1.id=t.id AND t1.type=30 AND t1.time<t.time) t1
   WHERE type = 70
  ) x
JOIN t ON t.id=x.id AND t.time=x.maxtime_before

如果类型= 70行之前没有type = 30的行,则可以使用

在结果列中显示该时间的null
WITH x AS
  (SELECT t.*,
          t1.maxtime_before
   FROM t
   OUTER APPLY
     (SELECT max(time) AS maxtime_before
      FROM t t1
      WHERE t1.id=t.id AND t1.type=30 AND t1.time<t.time) t1
   WHERE type = 70
  )
SELECT x.id,
       x.type,
       x.time,
       x.data*1.0/t.data as resullt
FROM t
JOIN x ON t.id=x.id AND t.time=x.maxtime_before
UNION ALL
SELECT id,
       type,
       time,
       NULL
FROM x
WHERE maxtime_before IS NULL

Sample Demo

另一种方法是使用max窗口函数来跟踪每个id = 30行的最大运行时间。

WITH x AS
  (SELECT t.*,
          MAX(CASE WHEN type=30 THEN time END) OVER(PARTITION BY id ORDER BY time) AS running_max
   FROM t
  )
SELECT x.id,
       x.type,
       x.time,
       x.data*1.0/t.data as result
FROM x
JOIN t ON t.id=x.id AND t.time=x.running_max
WHERE x.type=70
UNION ALL
SELECT id,
       type,
       time,
       NULL
FROM x 
WHERE running_max IS NULL

答案 1 :(得分:1)

有一种方法可以只用窗口函数来完成。

对于每一行,获取以前的类型和值。此外,以这样的方式枚举70s,您可以将它们识别为一个组(您可以使用累积总和来执行此操作)。

在下一步中,使用分区最大值来获取类型,最后进行计算。

select t.*,
       data / data_30 as result
from (select t.*,
             max(case when prev_type = 30 then prev_data end) over (partition by id, grp) as data_30
      from (select t.*,
                   sum(case when type <> 70 then 1 else 0 end) over (partition by id order by time) as grp,
                   lag(type) over (partition by id order by time) as prev_type,
                   lag(data) over (partition by id order by time) as prev_data
            from t
            where type in (30, 70)
           ) t
     ) t;

这是一个有趣的方面。通过将类型仅限制为30和70,我们保证每组70s直接在30之前。