我正在使用SQLServer2008并遇到了一个我从未见过的问题。我有一个数据集,每个季度都会复制一些值。我试图选择每个季度的最新值。
SELECT PPAV.BusinessID
, (cast(year(PPAV.PartnerAttributeValueStartDate) as char(4)) + '0' + cast(datepart(qq, PPAV.PartnerAttributeValueStartDate) as char(1))) AS Quarter
, PAV.PartnerAttributeValue
FROM Partner_PartnerAttributeValue PPAV
JOIN PartnerAttributeValue PAV
ON PAV.PartnerAttributeValueID = PPAV.PartnerAttributeValueID
WHERE PAV.PartnerAttributeID = 7
AND (PPAV.PartnerAttributeValueID = 22 OR PPAV.PartnerAttributeValueID = 795 OR PPAV.PartnerAttributeValueID = 796)
GROUP BY PPAV.BusinessID
, (cast(year(PPAV.PartnerAttributeValueStartDate) as char(4)) + '0' + cast(datepart(qq, PPAV.PartnerAttributeValueStartDate) as char(1)))
, PAV.PartnerAttributeValue
这是问题产生的代码。我每季度只需要一个值。有时季度中期会发生变化,信息会重复。当我试图解决这个问题时,我正在使用这个代码,它实际上使问题变得更糟,因为问题Quarter有4个值。
SELECT PPAV.BusinessID
, (cast(year(PPAV.PartnerAttributeValueStartDate) as char(4)) + '0' + cast(datepart(qq, PPAV.PartnerAttributeValueStartDate) as char(1))) AS Quarter
, CASE WHEN (cast(year(PPAV.PartnerAttributeValueStartDate) as char(4)) + '0' + cast(datepart(qq, PPAV.PartnerAttributeValueStartDate) as char(1))) = (cast(year(PPAV.PartnerAttributeValueStartDate) as char(4)) + '0' + cast(datepart(qq, PPAV.PartnerAttributeValueStartDate) as char(1)))
THEN SubHist.PartnerAttributeValue
ELSE PAV.PartnerAttributeValue
END AS PartnerAttributeValue
FROM Partner_PartnerAttributeValue PPAV
JOIN PartnerAttributeValue PAV
ON PAV.PartnerAttributeValueID = PPAV.PartnerAttributeValueID
JOIN ( SELECT PPAV.BusinessID
, MAX(PPAV.PartnerAttributeValueStartDate) AS MAX
, PAV.PartnerAttributeValue
FROM Partner_PartnerAttributeValue PPAV
JOIN PartnerAttributeValue PAV
ON PAV.PartnerAttributeValueID = PPAV.PartnerAttributeValueID
WHERE PAV.PartnerAttributeID = 7
AND (PPAV.PartnerAttributeValueID = 22 OR PPAV.PartnerAttributeValueID = 795 OR PPAV.PartnerAttributeValueID = 796)
GROUP BY PAV.PartnerAttributeValue
,PPAV.BusinessID
)SubHist
ON SubHist.BusinessID = PPAV.BusinessID
WHERE PAV.PartnerAttributeID = 7
AND (PPAV.PartnerAttributeValueID = 22 OR PPAV.PartnerAttributeValueID = 795 OR PPAV.PartnerAttributeValueID = 796)
GROUP BY PPAV.BusinessID
, (cast(year(PPAV.PartnerAttributeValueStartDate) as char(4)) + '0' + cast(datepart(qq, PPAV.PartnerAttributeValueStartDate) as char(1)))
, PAV.PartnerAttributeValue
, SubHist.PartnerAttributeValue
我非常不确定我做了什么来使问题变得更糟。我认为我的CASE WHEN声明来自额外的连接表将修复它。
非常感谢任何帮助!
以下是我试图消除的一些示例数据
4356 201501 REGISTERED
4356 201502 REGISTERED
4356 201503 REGISTERED
4356 201504 REGISTERED
4356 201601 GOLD
4356 201601 REGISTERED
4356 201602 REGISTERED
4356 201603 REGISTERED
4356 201604 REGISTERED
问题在于2016年第一季度有多个值,数据会因此而出现偏差。应该只有GOLD值,而不是GOLD和Registered
谢谢!
答案 0 :(得分:2)
使用窗口函数为每个季度和业务ID生成行号。然后仅限于每组的第1行编号(RN)......
由于必须先生成RN才能限制它,我们将它包装在CTE或子查询中然后应用RN = 1 ...
我也是:
任何这些额外的更改都可能引入了语法错误。
UNTESTED 如果SQL Fiddle中提供了以下表格结构和示例数据,我会对其进行测试。
Select * from (
SELECT PPAV.BusinessID
, concat(year(PPAV.PartnerAttributeValueStartDate)
, '0'
,datepart(qq, PPAV.PartnerAttributeValueStartDate)
)
AS Quarter
, PAV.PartnerAttributeValue
, row_number()
Over (PARTITION BY PPAV.BusinessID
, year(PPAV.PartnerAttributeValueStartDate)
, datepart(qq, PPAV.PartnerAttributeValueStartDate))
ORDER BY PartnerAttributeValueStartDate DESC) RN
FROM Partner_PartnerAttributeValue PPAV
JOIN PartnerAttributeValue PAV
ON PAV.PartnerAttributeValueID = PPAV.PartnerAttributeValueID
WHERE PAV.PartnerAttributeID = 7
AND PPAV.PartnerAttributeValueID IN (22, 795,796)
GROUP BY PPAV.BusinessID
, concat(year(PPAV.PartnerAttributeValueStartDate)
, '0'
,datepart(qq, PPAV.PartnerAttributeValueStartDate)
)
, PAV.PartnerAttributeValue) cte
from cte where RN = 1