有一个记录集,我希望在最低日期基于PO获取记录,直到成本发生变化。任何帮助,将不胜感激。有大约700万条记录,目前的光标并没有削减它。
样品:
PO log_Ts cost
123 2012-06-26-10.37.44.035385 2.5896
123 2012-06-27-02.16.14.706817 2.5896
123 2012-06-26-10.28.57.540731 2.591
123 2012-06-26-10.37.43.948940 2.5896
123 2012-06-26-10.37.43.421713 2.5896
123 2012-06-26-18.34.37.191917 2.5896
123 2012-06-27-02.16.14.705622 2.5896
123 2012-06-27-04.33.18.264742 2.5896
123 2012-06-26-10.37.44.007667 2.5896
123 2012-06-26-10.37.43.706207 2.5896
123 2012-06-26-10.26.56.767121 2.5896
123 2012-06-26-10.37.43.919248 2.5896
Looking to grab:
PO log_Ts cost
123 2012-06-26-10.26.56.767121 2.5896
123 2012-06-26-10.28.57.540731 2.591
123 2012-06-26-10.37.43.421713 2.5896
答案 0 :(得分:1)
您可以尝试以下操作:
SELECT
按log_ts顺序排列的行,并将行号与每行相关联。SELF JOIN
firstTable.rownum = secondTable.rownum -1中#1的结果。工作表中的每一行都包含当前和下一条记录。WHERE
子句,使得费用值不同,并且您对SELF JOIN
中第二组中的元组感兴趣。E.g。
RowNum PO log_Ts cost RowNum PO log_Ts cost
1 123 2012-06-26-10.37.44.035385 2.5896 2 123 2012-06-27-02.16.14.706817 2.5896
2 123 2012-06-27-02.16.14.706817 2.5896 3 123 2012-06-26 10.28.57.540731 2.591
3 123 2012-06-26-10.28.57.540731 2.591 4 123 2012-06-26-10.37.43.948940 2.5896
查询:
WITH T (PO, Log_Ts, Cost, RowNum)
AS
(
SELECT PO, Log_Ts, Cost, Row_Number() OVER(ORDER BY Log_Ts) FROM PO_INFO
)
SELECT T2.*
FROM
T t1
JOIN T t2
ON t1.rownum = t2.rownum - 1 AND t1.cost != t2.cost
结果:
123 2012-06-26 10.28.57.540731 2.591 2
123 2012-06-26-10.37.43.948940 2.5896 3
HTH。
答案 1 :(得分:0)
这实际上适用于程序逻辑(代码)而不是基于集合的逻辑(sql)。因此,我建议在可能的情况下在代码中进行,只需对数据集进行排序并循环遍历它。
如果代码不是一个选项,您可以使用游标和循环在存储过程中执行相同的操作。
答案 2 :(得分:0)
如果您的DBMS支持LAG()函数,则可以在t-clausen.dk和Vikdor的答案中使用CTE方法,但不需要自联接。
WITH t
AS
(
SELECT PO, log_Ts, Cost,
LAG( Cost ) OVER( PARTITION BY PO ORDER BY log_Ts) AS prevcost
FROM po_log_events
)
SELECT PO, log_Ts, Cost
FROM t
WHERE prevcost IS NULL
OR prevcost <> cost
ORDER BY PO, log_Ts
如果您的数据库没有ROW_NUMBER()和公用表表达式(即您的数据库是MySQL),则可以通过相关子查询获得相同的结果:
SELECT DISTINCT p.PO, p.log_Ts, p.Cost
FROM po_log_events p
WHERE NOT EXISTS
( SELECT 1 FROM po_log_events p2
WHERE p2.PO = p.PO AND p2.log_Ts < p.log_Ts )
OR NOT EXISTS
( SELECT 1 FROM po_log_events p3
WHERE p3.PO = p.PO
AND p3.log_Ts =
(SELECT MAX(p4.log_ts)
FROM po_log_events p4
WHERE p4.PO = p.PO
AND p4.log_Ts < p.log_Ts
)
AND p3.Cost = p.Cost
)
如果表上有唯一索引(PO,log_Ts),则可以删除DISTINCT
答案 3 :(得分:0)
;with a as
(
select po, log_ts, cost, row_number() over (partition by po order by log_ts) rn
from <table>
), b as
(
select po, log_ts, cost, 1 grp, rn from a where rn = 1
union all
select a.po, a.log_ts, a.cost, case when a.cost = b.cost then b.grp else b.grp+1 end, a.rn
from a
join b on a.rn = b.rn+1 and a.po = b.po
)
select po, min(log_ts) log_ts, cost
from b
group by po, cost, grp