在######
之后查看我的解决方案MS sql server 2012。 我需要在下面的Flow列中删除相邻的重复行,并保留第一行(标记为*来说明)。然后对所有行取1和0之间的时间差,得到累计时间。
Record Number Downhole Time Flow
-------------------------------------------
0 03/27/2013 19:23:48.582 1 *
58 03/27/2013 19:28:12.606 1
137 03/27/2013 19:32:16.070 0 *
143 03/27/2013 19:33:59.070 0
255 03/27/2013 19:40:14.070 0
272 03/29/2013 14:43:55.071 1 *
289 03/29/2013 14:45:44.070 1
293 03/29/2013 14:45:59.071 0 *
294 03/29/2013 14:46:10.070 0
删除相邻的结果
Record Number Downhole Time Flow
-------------------------------------------
0 03/27/2013 19:23:48.582 1 *
137 03/27/2013 19:32:16.070 0 *
272 03/29/2013 14:43:55.071 1 *
293 03/29/2013 14:45:59.071 0 *
最终预期结果 ,累积时差=(2013年3月27日19:32:16.070 - 03/27/2013 19:23:48.582)+( 03/29/2013 14:45:59.071 - 03/29/2013 14:43:55.071)+如果有更多行。
解决方案 #######以下内容在SQL编辑器中看起来好多了,只需将其粘贴即可
WITH FlowEvntTable AS (
/* the following gets raw data and adds Row# for the next select to use*/
Select
ROW_NUMBER() OVER (ORDER BY [Downhole Time]) AS RNum,
[Downhole Time],
[Record Number],
Value As Flow
FROM [newMDF].[dbo].[vLog]
where
[Event Name] like 'Flow%'
AND [Field Name] like 'Flow'
),
NoDuplicatesFlowTable AS (
/*the following line came from StackOverflow "ignore adjacent matching rows" */
Select [Downhole Time], [Flow] from FlowEvntTable A where A.RNum NOT IN (SELECT A.RNUM from FlowEvntTable A JOIN FlowEvntTable B ON B.RNum +1 = A.RNum AND B.Flow=A.Flow)
),
FlowOffColAddedTable AS (
Select *, lead([Downhole Time]) OVER (ORDER BY [Downhole Time]) AS NotFlowTime from NoDuplicatesFlowTable
),
FlowStartEndTimeTable AS (
/*select above adds time offest by 1 row to a new column. now by Flow = 1/On, you get Start End On pairs */
Select [Downhole Time] AS StartTime, NotFlowTime AS EndTime from FlowOffColAddedTable where Flow = 1
)
/*diff and sum the pairs*/
Select Sum(DATEDIFF(ms,StartTime,EndTime))/1000 AS VibeOnSec From
FlowStartEndTimeTable
" Select *,lead ..."之后的中间结果上方。 是的,它与上述数据不匹配,只是为了给出一个粗略的想法。
Downhole Time Flow NotFlowTime
-------------------------------------------
2013-03-28 00:23:48.0000000 1 2013-03-28 00:32:16.0000000
2013-03-28 00:32:16.0000000 0 2013-03-28 00:33:59.0000000
2013-03-28 00:33:59.0000000 1 2013-03-28 00:40:14.0000000
2013-03-28 00:40:14.0000000 0 2013-03-29 19:43:55.0000000
2013-03-29 19:43:55.0000000 1 2013-03-29 19:45:44.0000000
答案 0 :(得分:0)
不确定您使用的是哪种数据库。这是一个具有分析功能和Oracle的解决方案:
SELECT
un,
mytime,
flow,
lead (mytime) OVER (ORDER BY UN) lead_time,
(lead (mytime) OVER (ORDER BY UN) - mytime)*24*60 minutes
FROM ( SELECT un,
mytime,
flow,
LAG (flow) OVER (ORDER BY UN) lag_val
FROM test
ORDER BY un) a
WHERE a.flow != NVL (a.lag_val, 9999)
内部选择使用LAG分析函数获取前一个流的值。外部选择的where子句过滤“重复”流(仅留下更改的rist事件)。外部选择还使用LEAD分析函数计算时间差(以分钟为单位)。尽管您拥有大量数据,但这将是非常好的性能。 让我知道您正在使用什么类型的数据库 - 大多数数据库都有分析函数实现(或解决方法)......这只适用于Orace。
答案 1 :(得分:0)
我相信这可以完成你所要求的工作:
WITH FlowIntervals AS (
SELECT
FromTime = Min(D.[Downhole Time]),
X.ToTime
FROM
dbo.vLog D
OUTER APPLY (
SELECT TOP 1 ToTime = D2.[Downhole Time]
FROM dbo.vLog D2
WHERE
D.[Downhole Time] < D2.[Downhole Time]
AND D.[Flow] <> D2.[Flow]
ORDER BY D2.[Downhole Time]
) X
WHERE D.Flow = 1
GROUP BY X.ToTime
)
SELECT Sum(DateDiff(ms, FromTime, IsNull(ToTime, GetDate())) / 1000.0)
FROM FlowIntervals
;
此查询适用于SQL 2005及更高版本。它会表现得很好,但需要vLog表的自联接,因此它的性能可能不如使用LEAD
或LAG
的解决方案。
如果您正在寻找绝对最佳的性能,此查询可能会起到作用:
WITH Ranks AS (
SELECT
Grp =
Row_Number() OVER (ORDER BY [Downhole Time])
- Row_Number() OVER (PARTITION BY Flow ORDER BY [Downhole Time]),
[Downhole Time],
Flow
FROM dbo.vLog
), Ranges AS (
SELECT
Result = Row_Number() OVER (ORDER BY Min(R.[Downhole Time]), X.Num) / 2,
[Downhole Time] = Min(R.[Downhole Time]),
R.Flow, X.Num
FROM
Ranks R
CROSS JOIN (SELECT 1 UNION ALL SELECT 2) X (Num)
GROUP BY
R.Flow, R.Grp, X.Num
), FlowStates AS (
SELECT
FromTime = Min([Downhole Time]),
ToTime = CASE WHEN Count(*) = 1 THEN NULL ELSE Max([Downhole Time]) END,
Flow = IsNull(Min(CASE WHEN Num = 2 THEN Flow ELSE NULL END), Min(Flow))
FROM Ranges R
WHERE Result > 0
GROUP BY Result
)
SELECT
ElapsedSeconds =
Sum(DateDiff(ms, FromTime, IsNull(ToTime, GetDate())) / 1000.0)
FROM
FlowStates
WHERE
Flow = 1
;
使用您的示例数据,它返回631.486000
(秒)。如果只选择FlowStates
CTE中的行,则会得到以下结果:
FromTime ToTime Flow
----------------------- ----------------------- ----
2013-03-27 19:23:48.583 2013-03-27 19:32:16.070 1
2013-03-27 19:32:16.070 2013-03-29 14:43:55.070 0
2013-03-29 14:43:55.070 2013-03-29 14:45:59.070 1
2013-03-29 14:45:59.070 NULL 0
此查询在SQL 2005及更高版本中运行,并且应该与任何其他解决方案(包括使用LEAD
或LAG
(以偷偷摸摸的方式模拟)的解决方案在性能方面非常好地叠加。我不承诺它会赢,但它可以做得很好,毕竟可能会赢。
有关查询内容的详细信息,请参阅this answer to a similar question。
最后,对于完整的解决方案,这里是SQL Server的滞后/潜在客户版本:
WITH StateChanges AS (
SELECT
[Downhole Time],
Flow,
Lag(Flow) OVER (ORDER BY [Downhole Time]) PrevFlow
FROM
dbo.vLog
), Durations AS (
SELECT
[Downhole Time],
Lead([Downhole Time]) OVER (ORDER BY [Downhole Time]) NextTime,
Flow
FROM
StateChanges
WHERE
Flow <> PrevFlow
OR PrevFlow IS NULL
)
SELECT ElapsedTime = Sum(DateDiff(ms, [Downhole Time], NextTime) / 1000.0)
FROM Durations
WHERE Flow = 1
;
此查询需要SQL Server 2012或更高版本。它计算状态变化(流量变化?),然后选择流量确实变化的那些变量,然后最终计算流量从0变为1(流量开始)的持续时间。
我很想知道这个查询的I / O和时间与其他查询的实际性能结果。如果你只看执行计划,这个查询似乎会赢 - 但它可能不是真正的性能统计数据的明显赢家。
答案 2 :(得分:0)
在我的问题之后我发布了答案(请参阅上面的######之后的解决方案)。谢谢大家的花絮。
PS我试图想出堆栈溢出编辑器/系统,因此我的答案在问题发生后一段时间,在同一个地方,抱歉。