我想要以下的SQL查询。我是SQL的新手。下表只是我所拥有的数据类型的一个例子。我有大约3000万行的非常大的数据,并且想写一个查询来获得下面的输出表。
Id type data time
-----------------------------------------------------------
1 30 3.9 15:50:10.660555
1 30 4.0 15:50:10.660777
1 70 11.5 15:50:10.797966
1 30 4.1 15:50:10.834444
1 70 12.6 15:50:10.853114
1 70 16.7 15:50:10.955086
1 30 5 15:50:10.99
11 30 3.8 15:50:11.660555
11 30 4.1 15:50:11.660777
11 70 12.5 15:50:11.797966
11 30 4.7 15:50:11.834444
11 70 12.68 15:50:11.853114
11 70 16.76 15:50:11.955086
11 30 5.1 15:50:11.99
我有一张上面的表格。对于每种类型70,我需要计算具有最后已知类型30的内容。例如,对于Id = 1,对于第一种类型=在15:50:10.797966处的70数据,我需要在15处获得type = 30数据: 50:10.660777以便我可以计算结果= 11.5 / 4.0。同样,对于15:50:10.853114的type = 70,我想要15 = 50:10.834444的type = 30的数据,所以我的结果= 12.6 / 4.1。
我希望输出看起来像这样:
Id type result time
------------------------------------------------------
1 70 11.5/4.0 15:50:10.797966
1 70 12.6/4.1 15:50:10.853114
1 70 16.7/4.1 15:50:10.955086
11 70 12.5/4.1 15:50:11.797966
11 70 12.68/4.7 15:50:11.853114
11 70 16.76/4.7 15:50:11.955086
我希望能够使用pyodbc在python中执行这些SQL查询。
任何帮助将不胜感激!在此先感谢!!
答案 0 :(得分:1)
假设每个id至少有一个类型= 30行= type = 70,你可以使用outer apply
执行此操作,在每个类型= 70行之前获取type = 30的max
时间并使用分裂的价值。
SELECT x.id,
x.type,
x.time,
x.data*1.0/t.data as result
FROM
(SELECT t.*,t1.maxtime_before
FROM t
OUTER APPLY
(SELECT max(time) AS maxtime_before
FROM t t1
WHERE t1.id=t.id AND t1.type=30 AND t1.time<t.time) t1
WHERE type = 70
) x
JOIN t ON t.id=x.id AND t.time=x.maxtime_before
如果类型= 70行之前没有type = 30的行,则可以使用
在结果列中显示该时间的null
值
WITH x AS
(SELECT t.*,
t1.maxtime_before
FROM t
OUTER APPLY
(SELECT max(time) AS maxtime_before
FROM t t1
WHERE t1.id=t.id AND t1.type=30 AND t1.time<t.time) t1
WHERE type = 70
)
SELECT x.id,
x.type,
x.time,
x.data*1.0/t.data as resullt
FROM t
JOIN x ON t.id=x.id AND t.time=x.maxtime_before
UNION ALL
SELECT id,
type,
time,
NULL
FROM x
WHERE maxtime_before IS NULL
另一种方法是使用max
窗口函数来跟踪每个id = 30行的最大运行时间。
WITH x AS
(SELECT t.*,
MAX(CASE WHEN type=30 THEN time END) OVER(PARTITION BY id ORDER BY time) AS running_max
FROM t
)
SELECT x.id,
x.type,
x.time,
x.data*1.0/t.data as result
FROM x
JOIN t ON t.id=x.id AND t.time=x.running_max
WHERE x.type=70
UNION ALL
SELECT id,
type,
time,
NULL
FROM x
WHERE running_max IS NULL
答案 1 :(得分:1)
有一种方法可以只用窗口函数来完成。
对于每一行,获取以前的类型和值。此外,以这样的方式枚举70s,您可以将它们识别为一个组(您可以使用累积总和来执行此操作)。
在下一步中,使用分区最大值来获取类型,最后进行计算。
select t.*,
data / data_30 as result
from (select t.*,
max(case when prev_type = 30 then prev_data end) over (partition by id, grp) as data_30
from (select t.*,
sum(case when type <> 70 then 1 else 0 end) over (partition by id order by time) as grp,
lag(type) over (partition by id order by time) as prev_type,
lag(data) over (partition by id order by time) as prev_data
from t
where type in (30, 70)
) t
) t;
这是一个有趣的方面。通过将类型仅限制为30和70,我们保证每组70s直接在30之前。