所以,我有一些有开始日期和结束日期的历史表。问题是,这个表中有几条记录引用相同的东西,但它们的开始日期和结束日期并不准确。所以,我正在努力统一他们的开始和结束日期。
因此,每组记录都有接近的开始和结束日期(大约7秒内)。然后会有另一个集群,具有相同的密钥(在本例中为VoyageID),但是有一组不同的关闭开始日期和结束日期。那有意义吗?如果没有,我可以发布一些样本数据。
无论如何,我现在的目标是找到每个群集的最小开始日期。我现在拥有的每个VoyageID的最低要求。任何帮助,将不胜感激。谢谢!
这就是我所拥有的:
DECLARE @7S DATETIME
SET @7S = '0:0:07'
PRINT @7S
SELECT MAX(T1.BeginDate), T1.VoyageID FROM
hist.VoyageProfitLossValues T1 INNER JOIN
hist.VoyageProfitLossValues T2 ON
T1.VoyageID = T2.VoyageID AND
T1.BeginDate BETWEEN (T2.BeginDate - @7S) and (T2.BeginDate + @7S)
GROUP BY T1.VoyageID
编辑:示例数据:
BeginDate EndDate VoyageID
2011-07-05 07:02:50.713 2011-07-05 07:25:53.007 6312
2011-07-05 07:02:50.870 2011-07-05 07:25:53.693 6312
2011-07-05 07:02:51.027 2011-07-05 07:25:54.387 6312
2011-07-08 14:22:21.147 NULL 6312
2011-07-08 14:22:21.163 NULL 6312
2011-07-08 14:22:21.177 NULL 6312
注意:每次航行的实际数据超过3次,而且BeginDates可以更远。
我希望不用这个:
BeginDate VoyageID
2011-07-05 07:02:50.713 6312
2011-07-08 14:22:21.147 6312
我所拥有的只是给我第一行。
我最终也会使用结束日期,但我可以轻松地将其转换为另一个。
答案 0 :(得分:2)
此解决方案的想法是为每个BeginDate
在VoyageID
上订购行。从顶部开始,选择时差超过7秒的行到上一行。
@Voy
代替hist.VoyageProfitLossValues
。首先,我创建一个临时表#T
,它将为ID
列填充每个VoyageID
的有序值。 C
是一个递归CTE,从ID = 1
开始,遍历所有行,将当前行与前一行进行比较,并将结果存储在列FirstDate
中。我在示例数据中添加了第二个VoyageID
,以证明它也适用于此。
declare @Voy table
(
BeginDate datetime,
EndDate datetime,
VoyageID int
)
insert into @Voy values
('2011-07-05 07:02:50.713', '2011-07-05 07:25:53.007', 6312),
('2011-07-05 07:02:50.870', '2011-07-05 07:25:53.693', 6312),
('2011-07-05 07:02:51.027', '2011-07-05 07:25:54.387', 6312),
('2011-07-08 14:22:21.147', NULL , 6312),
('2011-07-08 14:22:21.163', NULL , 6312),
('2011-07-08 14:22:21.177', NULL , 6312),
('2011-07-05 07:02:50.713', '2011-07-05 07:25:53.007', 6313),
('2011-07-05 07:02:50.870', '2011-07-05 07:25:53.693', 6313),
('2011-07-05 07:02:51.027', '2011-07-05 07:25:54.387', 6313),
('2011-07-08 14:22:21.147', NULL , 6313),
('2011-07-08 14:22:21.163', NULL , 6313),
('2011-07-08 14:22:21.177', NULL , 6313)
create table #T
(
ID int,
VoyageID int,
BeginDate datetime
primary key (ID, VoyageID)
)
insert into #T (ID, VoyageID, BeginDate)
select row_number() over(partition by VoyageID order by BeginDate),
VoyageID,
BeginDate
from @Voy
;with C as
(
select T.ID,
T.VoyageID,
T.BeginDate,
1 as FirstDate
from #T as T
where T.ID = 1
union all
select T.ID,
T.VoyageID,
T.BeginDate,
case when datediff(second, C.BeginDate, T.BeginDate) > 7 then 1 else 0 end
from #T as T
inner join C
on T.ID = C.ID + 1 and
T.VoyageID = C.VoyageID
)
select C.BeginDate,
C.VoyageID
from C
where C.FirstDate = 1
order by C.VoyageID,
C.BeginDate
option (maxrecursion 0)
drop table #T
结果:
BeginDate VoyageID
----------------------- -----------
2011-07-05 07:02:50.713 6312
2011-07-08 14:22:21.147 6312
2011-07-05 07:02:50.713 6313
2011-07-08 14:22:21.147 6313
答案 1 :(得分:0)
此方法使用Cursor。我不知道它是否适合您:
create table #datacluster (
dateCluster datetime,
dateV datetime primary key)
DECLARE @7S DATETIME
DECLARE @base DATETIME
DECLARE @begindate DATETIME
SELECT @base = SYSDATETIME()
SET @7S = '0:0:07'
DECLARE cursor1 CURSOR
FAST_FORWARD READ_ONLY FOR
SELECT distinct T1.BeginDate
FROM
hist.VoyageProfitLossValues T1
ORDER BY T1.BeginDate DESC
FETCH NEXT FROM cursor1
INTO @begindate;
WHILE @@FETCH_STATUS = 0
BEGIN
IF @base - @7S > @begindate
BEGIN
set @base = @begindate
END
insert into #datacluster ( dateCluster, dateV)
values (@base, @begindate)
FETCH NEXT FROM cursor1
INTO @begindate;
END
从#dataCluster更新VoyageProfitLossValues表:
UPDATE hist.VoyageProfitLossValues
SET BeginDate = (
SELECT C.BeginDate
FROM #datacluster C
WHERE
C.dateV = hist.VoyageProfitLossValues.BeginDate
)
注1:未经测试!!
<强>优化强>
临时表上的主键。 快进只读光标。