我(已经简化了示例)包含以下数据的表
Row Start Finish ID Amount
--- --------- ---------- -- ------
1 2008-10-01 2008-10-02 01 10
2 2008-10-02 2008-10-03 02 20
3 2008-10-03 2008-10-04 01 38
4 2008-10-04 2008-10-05 01 23
5 2008-10-05 2008-10-06 03 14
6 2008-10-06 2008-10-07 02 3
7 2008-10-07 2008-10-08 02 8
8 2008-10-08 2008-11-08 03 19
日期代表一段时间,ID是系统在该期间所处的状态,金额是与该状态相关的值。
我想要做的是使用相同的 ID号聚合相邻行的金额,但保持相同的整体顺序,以便可以组合连续的运行。因此,我希望得到如下数据:
Row Start Finish ID Amount
--- --------- ---------- -- ------
1 2008-10-01 2008-10-02 01 10
2 2008-10-02 2008-10-03 02 20
3 2008-10-03 2008-10-05 01 61
4 2008-10-05 2008-10-06 03 14
5 2008-10-06 2008-10-08 02 11
6 2008-10-08 2008-11-08 03 19
我正在使用可以放入SP的T-SQL解决方案,但我无法通过简单查询看到如何做到这一点。我怀疑它可能需要某种迭代,但我不想走那条路。
我想要进行此聚合的原因是该过程的下一步是按顺序中出现的唯一ID进行SUM()和Count(),这样我的最终数据看起来就像:
ID Counts Total
-- ------ -----
01 2 71
02 2 31
03 2 33
但是,如果我做一个简单的
SELECT COUNT(ID), SUM(Amount) FROM data GROUP BY ID
在原始表格中,我得到类似
的内容ID Counts Total
-- ------ -----
01 3 71
02 3 31
03 2 33
这不是我想要的。
答案 0 :(得分:4)
如果您阅读R T Snodgrass中的“在SQL中开发面向时间的数据库应用程序”一书(其pdf可从他的网站上的出版物中获得),并在p165-166上获得图6.25。 ,您将找到可以在当前示例中使用的非平凡SQL,以使用相同的ID值和连续的时间间隔对各行进行分组。
下面的查询开发接近正确,但最后发现了一个问题,它的源头在第一个SELECT语句中。我还没有找到为什么给出错误的答案。 [如果有人可以在他们的DBMS上测试SQL并告诉我第一个查询是否在那里正常工作,那将是一个很大的帮助!]
它看起来像:
-- Derived from Figure 6.25 from Snodgrass "Developing Time-Oriented
-- Database Applications in SQL"
CREATE TABLE Data
(
Start DATE,
Finish DATE,
ID CHAR(2),
Amount INT
);
INSERT INTO Data VALUES('2008-10-01', '2008-10-02', '01', 10);
INSERT INTO Data VALUES('2008-10-02', '2008-10-03', '02', 20);
INSERT INTO Data VALUES('2008-10-03', '2008-10-04', '01', 38);
INSERT INTO Data VALUES('2008-10-04', '2008-10-05', '01', 23);
INSERT INTO Data VALUES('2008-10-05', '2008-10-06', '03', 14);
INSERT INTO Data VALUES('2008-10-06', '2008-10-07', '02', 3);
INSERT INTO Data VALUES('2008-10-07', '2008-10-08', '02', 8);
INSERT INTO Data VALUES('2008-10-08', '2008-11-08', '03', 19);
SELECT DISTINCT F.ID, F.Start, L.Finish
FROM Data AS F, Data AS L
WHERE F.Start < L.Finish
AND F.ID = L.ID
-- There are no gaps between F.Finish and L.Start
AND NOT EXISTS (SELECT *
FROM Data AS M
WHERE M.ID = F.ID
AND F.Finish < M.Start
AND M.Start < L.Start
AND NOT EXISTS (SELECT *
FROM Data AS T1
WHERE T1.ID = F.ID
AND T1.Start < M.Start
AND M.Start <= T1.Finish))
-- Cannot be extended further
AND NOT EXISTS (SELECT *
FROM Data AS T2
WHERE T2.ID = F.ID
AND ((T2.Start < F.Start AND F.Start <= T2.Finish)
OR (T2.Start <= L.Finish AND L.Finish < T2.Finish)));
该查询的输出是:
01 2008-10-01 2008-10-02
01 2008-10-03 2008-10-05
02 2008-10-02 2008-10-03
02 2008-10-06 2008-10-08
03 2008-10-05 2008-10-06
03 2008-10-05 2008-11-08
03 2008-10-08 2008-11-08
已编辑:倒数第二行有问题 - 它不应该存在。而且我还不清楚(它)来自哪里。
现在我们需要将该复杂表达式视为另一个SELECT语句的FROM子句中的查询表达式,该语句将对与上面显示的最大范围重叠的条目求和给定ID的金额值。
SELECT M.ID, M.Start, M.Finish, SUM(D.Amount)
FROM Data AS D,
(SELECT DISTINCT F.ID, F.Start, L.Finish
FROM Data AS F, Data AS L
WHERE F.Start < L.Finish
AND F.ID = L.ID
-- There are no gaps between F.Finish and L.Start
AND NOT EXISTS (SELECT *
FROM Data AS M
WHERE M.ID = F.ID
AND F.Finish < M.Start
AND M.Start < L.Start
AND NOT EXISTS (SELECT *
FROM Data AS T1
WHERE T1.ID = F.ID
AND T1.Start < M.Start
AND M.Start <= T1.Finish))
-- Cannot be extended further
AND NOT EXISTS (SELECT *
FROM Data AS T2
WHERE T2.ID = F.ID
AND ((T2.Start < F.Start AND F.Start <= T2.Finish)
OR (T2.Start <= L.Finish AND L.Finish < T2.Finish)))) AS M
WHERE D.ID = M.ID
AND M.Start <= D.Start
AND M.Finish >= D.Finish
GROUP BY M.ID, M.Start, M.Finish
ORDER BY M.ID, M.Start;
这给出了:
ID Start Finish Amount
01 2008-10-01 2008-10-02 10
01 2008-10-03 2008-10-05 61
02 2008-10-02 2008-10-03 20
02 2008-10-06 2008-10-08 11
03 2008-10-05 2008-10-06 14
03 2008-10-05 2008-11-08 33 -- Here be trouble!
03 2008-10-08 2008-11-08 19
已编辑:这是几乎正确的数据集,用于执行原始问题请求的COUNT和SUM聚合,因此最终答案为:
SELECT I.ID, COUNT(*) AS Number, SUM(I.Amount) AS Amount
FROM (SELECT M.ID, M.Start, M.Finish, SUM(D.Amount) AS Amount
FROM Data AS D,
(SELECT DISTINCT F.ID, F.Start, L.Finish
FROM Data AS F, Data AS L
WHERE F.Start < L.Finish
AND F.ID = L.ID
-- There are no gaps between F.Finish and L.Start
AND NOT EXISTS
(SELECT *
FROM Data AS M
WHERE M.ID = F.ID
AND F.Finish < M.Start
AND M.Start < L.Start
AND NOT EXISTS
(SELECT *
FROM Data AS T1
WHERE T1.ID = F.ID
AND T1.Start < M.Start
AND M.Start <= T1.Finish))
-- Cannot be extended further
AND NOT EXISTS
(SELECT *
FROM Data AS T2
WHERE T2.ID = F.ID
AND ((T2.Start < F.Start AND F.Start <= T2.Finish) OR
(T2.Start <= L.Finish AND L.Finish < T2.Finish)))
) AS M
WHERE D.ID = M.ID
AND M.Start <= D.Start
AND M.Finish >= D.Finish
GROUP BY M.ID, M.Start, M.Finish
) AS I
GROUP BY I.ID
ORDER BY I.ID;
id number amount
01 2 71
02 2 31
03 3 66
查看强>: 哦! Drat ... 3的条目有两倍于它应该具有的'金额'。之前的“已编辑”部分表示事情开始出错的地方。看起来好像第一个查询是巧妙的错误(可能是针对不同的问题),或者我正在使用的优化器是行为不端。然而,应该有一个与此密切相关的答案,它将给出正确的值。
对于记录:在Solaris 10上的IBM Informix Dynamic Server 11.50上进行了测试。但是,应该可以在任何其他符合中等标准的SQL DBMS上正常工作。
答案 1 :(得分:1)
可能需要创建游标并循环遍历结果,跟踪您正在使用的ID以及沿途累积数据。当id更改时,您可以将累积的数据插入临时表,并在过程结束时返回表(从中选择全部)。基于表的函数可能会更好,因为您可以随时插入返回表。
答案 2 :(得分:1)
我怀疑它可能需要某种迭代,但我不想走那条路。
我认为这是您必须采用的路线,使用光标填充表变量。如果您有大量记录,则可以使用永久表来存储结果,然后在需要检索数据时,您只能处理新数据。
我会在源表中添加一个默认值为0的位字段,以跟踪已处理的记录。假设没有人在表上使用select *,添加具有默认值的列将不会影响应用程序的其余部分。
如果您需要帮助编写解决方案,请在此帖子中添加评论。
答案 3 :(得分:0)
好吧,我决定使用连接和游标的混合来沿着迭代路径。通过将数据表连接到自身,我可以创建仅包含连续记录的链接列表。
INSERT INTO #CONSEC
SELECT a.ID, a.Start, b.Finish, b.Amount
FROM Data a JOIN Data b
ON (a.Finish = b.Start) AND (a.ID = b.ID)
然后我可以通过使用游标迭代它来展开列表,并将更新返回到数据表进行调整(并从Data表中删除现在无关的记录)
DECLARE CCursor CURSOR FOR
SELECT ID, Start, Finish, Amount FROM #CONSEC ORDER BY Start DESC
@Total = 0
OPEN CCursor
FETCH NEXT FROM CCursor INTO @ID, @START, @FINISH, @AMOUNT
WHILE @FETCH_STATUS = 0
BEGIN
@Total = @Total + @Amount
@Start_Last = @Start
@Finish_Last = @Finish
@ID_Last = @ID
DELETE FROM Data WHERE Start = @Finish
FETCH NEXT FROM CCursor INTO @ID, @START, @FINISH, @AMOUNT
IF (@ID_Last<> @ID) OR (@Finish<>@Start_Last)
BEGIN
UPDATE Data
SET Amount = Amount + @Total
WHERE Start = @Start_Last
@Total = 0
END
END
CLOSE CCursor
DEALLOCATE CCursor
这一切都有效,并且对于我正在使用的典型数据具有可接受的性能。
我确实发现了上述代码的一个小问题。最初我是通过游标更新每个循环上的数据表。但这没效果。您似乎只能对记录执行一次更新,并且多次更新(为了保持添加数据)还原为读取记录的原始内容。