我正在尝试优化一些针对大量数据的查询。我将在这里尝试简化问题。让我们从一个示例表开始:
CREATE TABLE [dbo].[TestTable]
(
[ProjectID] [INT] NOT NULL,
[Index] [INT] NOT NULL,
[Voltage] [DECIMAL](18, 3) NOT NULL,
[Current] [DECIMAL](18, 3) NOT NULL
)
想象一下我们有以下数据:
ProjectID Index Voltage Current
---------------------------------------
1 1 2.3 3.4
1 2 2.5 3.3
1 3 2.7 3.0
1 4 2.8 2.9
1 5 2.5 3.1
1 6 2.0 3.4
1 7 1.2 3.5
1 8 0.5 3.0
2 1 2.0 1.0
2 2 5.0 2.0
2 3 3.0 2.0
2 4 1.0 1.0
实际上,我的目标是在索引列排序的起点和终点之间进行一些汇总。当我指的是起点和终点时,例如,我指的是从电压> = 2.5的第一行开始,然后继续直到遇到电压> = 1.5的最后一行
这是一个示例查询来说明:
WITH CTE AS
(
SELECT
StartingTable.ProjectID,
MIN(StartingTable.[Index]) StartingIndex,
MIN(EndingTable.[Index]) - 1 EndingIndex
FROM
TestTable StartingTable
JOIN TestTable EndingTable ON StartingTable.ProjectID = EndingTable.ProjectID
AND EndingTable.[Index] > StartingTable.[Index]
WHERE
StartingTable.Voltage >= 2.5
and EndingTable.Voltage <= 1.5
GROUP BY
StartingTable.ProjectID
)
SELECT
TestTable.ProjectID,
MAX(Voltage) MaxVoltage,
StartingIndex,
EndingIndex
FROM
TestTable
JOIN CTE ON TestTable.ProjectID = CTE.ProjectID
AND TestTable.[Index] >= StartingIndex
AND TestTable.[Index] <= EndingIndex
GROUP BY
TestTable.ProjectID,
StartingIndex,
EndingIndex
在示例中,它应该返回:
ProjectID MaxVoltage StartingIndex EndingIndex
1 2.800 2 6
2 5.000 2 3
那行得通,但是我真的不喜欢两次加入TestTable来获取开始和结束索引。我们正在处理一个表,我认为该表最终可能会包含价值TB的数据,因此我认为这是一个糟糕的选择。我只是不知道该怎么办。
我正在考虑某种使用窗口函数的方法,但是我不确定是否有可能。几乎就像我要这样做:
MAX(Voltage) OVER (PARTITION BY ProjectID ORDER BY [Index] ROWS BETWEEN Voltage >= 2.5 AND Voltage >= 1.5)
我还没有看到类似的可能性。我还提出了以下建议:
WITH CTE AS
(
SELECT
ProjectID,
[Index],
MAX(Voltage) OVER (PARTITION BY ProjectId ORDER BY [Index] ROWS UNBOUNDED PRECEDING) MaxVoltage
FROM
TestTable
)
SELECT
TestTable.ProjectID,
MAX(Voltage) MaxVoltage,
MIN(TestTable.[Index]) StartingIndex,
MAX(TestTable.[Index]) EndingIndex
FROM
TestTable
JOIN CTE ON TestTable.ProjectID = CTE.ProjectID
AND TestTable.[Index] = CTE.[Index]
WHERE
MaxVoltage >= 2.5
AND Voltage >= 1.5
GROUP BY
TestTable.ProjectID
我不确定这会好得多。有没有比我已经尝试过的更好的选择了?
答案 0 :(得分:2)
如果电压从不超过2.5,然后低于1.5,然后再次高于1.5,则可以应用条件聚合:
SELECT
ProjectID,
max(Voltage) as MaxVoltage,
MIN(case when Voltage >= 2.5 then [index] end) AS StartingIndex,
MAX(case when Voltage >= 1.5 then [index] end) AS EndingIndex
FROM TestTable
group by ProjectID
having MAX(Voltage) >= 2.5 -- to filter group which never reached 2.5
编辑:
如果您的Voltage重复了2.5到1.5之间的组,则只要[index]
列中没有空格,@ Clockwork-Muse的查询#2会正常工作,否则它将一个结果行分成两组。如果要忽略差距,请执行以下选择操作,以返回预期结果:
with cte as
(
SELECT
ProjectID,
[Index],
Voltage,
max(case when Voltage < 1.5 then [Index] end)
over (partition by ProjectID
order by [Index]
rows unbounded preceding) AS grp -- same value for a range of rows >= 1.5
FROM TestTable
)
select
ProjectID,
max(Voltage) as MaxVoltage,
MIN(case when Voltage >= 2.5 then [index] end) AS StartingIndex,
MAX([index]) AS EndingIndex
from cte
where Voltage >=1.5
group by ProjectID, grp
having MAX(Voltage) >= 2.5 -- to filter group which never reached 2.5
order by ProjectID, grp
;
这会用Voltage >= 1.5
对连续的行进行分组,并在低于1.5时启动一个新组,请参阅Clockwork-Muse修改后的db<>fiddle
答案 1 :(得分:0)
SELECT tt.ProjectID,
MAX(tt.Voltage) AS MaxVoltage,
x.StartIndex,
MAX(tt.[Index]) AS EndIndex
FROM TestTable AS tt
JOIN
(
SELECT ProjectID,
MIN([Index]) AS StartIndex
FROM TestTable
WHERE Voltage >= 2.5
GROUP BY ProjectID
) AS x ON tt.ProjectID = x.ProjectID
WHERE tt.Voltage >= 1.5
AND tt.[Index] >= x.StartIndex
GROUP BY tt.ProjectID, x.StartIndex
在此处查看完整测试:https://rextester.com/BCVL10968
答案 2 :(得分:0)
如果像您的示例数据集中那样,电压仅在达到1.5伏后才降低(并且永远不会重复),我们可以通过使用条件聚合来作弊:
SELECT [ProjectID], MAX([Voltage]) AS MaxVoltage,
MIN(CASE WHEN [Voltage] >= 2.5 THEN [Index] END) AS [StartingIndex],
MAX(CASE WHEN [Voltage] >= 1.5 THEN [Index] END) AS [EndingIndex]
FROM [dbo].[TestTable]
WHERE [Voltage] >= 1.5
GROUP BY [ProjectId]
HAVING MAX([Voltage]) >= 2.5
Example Fiddle
产生要求的内容:
ProjectID | MaxVoltage | StartingIndex | EndingIndex
--------: | :--------- | ------------: | ----------:
1 | 2.800 | 2 | 6
2 | 5.000 | 2 | 3
另一方面,如果我们需要警惕重启,事情会变得更加复杂,并且我们需要将其转变为gaps-and-islands解决方案的一种变体:
SELECT [ProjectID], MAX([Voltage]) AS [MaxVoltage],
MIN(CASE WHEN [Voltage] >= 2.5 THEN [Index] END) AS [StartingIndex],
MAX(CASE WHEN [Voltage] >= 1.5 THEN [Index] END) AS [EndingIndex]
FROM (SELECT [ProjectId], [Index], [Voltage],
[Index] - ROW_NUMBER() OVER(PARTITION BY [ProjectID] ORDER BY [Index]) AS [VoltageRun]
FROM [dbo].[TestTable]
WHERE [Voltage] >= 1.5) [TestTable]
GROUP BY [ProjectID], [VoltageRun]
HAVING MAX([Voltage]) >= 2.5
ORDER BY [ProjectID], [VoltageRun]
之所以有用,是因为您的表可以方便地存储(希望是无间隙的)[Index]
列。通过仅选择全部有效的行(>= 1.5
),ROW_NUMBER()
减法为我们获得了“分组列”-在聚合之前,结果集如下所示:
ProjectId | Index | Voltage | VoltageRun
--------: | ----: | :------ | :---------
1 | 1 | 2.300 | 0
1 | 2 | 2.500 | 0
1 | 3 | 2.700 | 0
1 | 4 | 2.800 | 0
1 | 5 | 2.500 | 0
1 | 6 | 2.000 | 0
1 | 9 | 2.300 | 2
1 | 10 | 2.500 | 2
1 | 11 | 2.700 | 2
1 | 12 | 2.800 | 2
1 | 13 | 2.500 | 2
1 | 14 | 2.000 | 2
2 | 1 | 2.000 | 0
2 | 2 | 5.000 | 0
2 | 3 | 3.000 | 0
([ProjectID]=1
的测试数据已重复)
此后,我们只需要在原始查询中将分组列作为额外的限定符即可。
(请注意,这种类型的查询是将分组列排除在SELECT
列表中的几次有意义的查询之一)