我有按年和季度划分的销售数据,而在去年,我想用最后的可用值来填充缺失的季度。
说我们有源表:
+------+---------+-------+--------+
| year | quarter | sales | row_no |
+------+---------+-------+--------+
| 2018 | 1 | 4000 | 5 |
| 2018 | 2 | 6000 | 4 |
| 2018 | 3 | 5000 | 3 |
| 2018 | 4 | 3000 | 2 |
| 2019 | 1 | 8000 | 1 |
+------+---------+-------+--------+
所需结果:
+------+---------+-------+------------------------+
| year | quarter | sales | |
+------+---------+-------+------------------------+
| 2018 | 1 | 4000 | |
| 2018 | 2 | 6000 | |
| 2018 | 3 | 5000 | |
| 2018 | 4 | 3000 | |
| 2019 | 1 | 8000 | |
| 2019 | 2 | 8000 | <repeat the last value |
| 2019 | 3 | 8000 | <repeat the last value |
| 2019 | 4 | 8000 | <repeat the last value |
+------+---------+-------+------------------------+
因此,任务是确定年和季度的笛卡尔坐标,并使相应的或最后的销售与之连接。
此代码使我快到了:
select r.year, k.quarter, t.sales
from (select distinct year from [MyTable]) r cross join
(select distinct quarter from [MyTable]) k left join
[MyTable] t
on (r.year = t.year and k.quarter=t.quarter) or row_no=1
如何更正最后一行(加入条件),以使2018年不加倍?
答案 0 :(得分:3)
一种方法使用外部套用:
select y.year, q.quarter, t.sales
from (select distinct year from [MyTable]) y cross join
(select distinct quarter from [MyTable]) q outer apply
(select top (1) t.*
from [MyTable] t
where t.year < y.year or
(t.year = y.year and t.quarter <= q.quarter)
order by t.year desc, t.quarter desc
) t;
对于您的数据量,应该没问题。
一种更有效的方法-假设您仅将值赋给末尾-
select y.year, q.quarter,
coalesce(t.sales, tdefault.sales)
from (select distinct year from [MyTable]) y cross join
(select distinct quarter from [MyTable]) q left join
[MyTable] t
on t.year = y.year and
t.quarter = q.quarter cross join
(select top (1) t.*
from [MyTable] t
order by t.year desc, t.quarter desc
) tdefault
答案 1 :(得分:1)
使用CTE和某些窗口函数的非常不同的方法。不需要对表进行2次扫描,也不需要三角连接。
WITH VTE AS(
SELECT *
FROM (VALUES (2018,1,4000,5),
(2018,2,6000,4),
(2018,3,5000,3),
(2018,4,3000,2),
(2019,1,8000,1)) V([Year],[Quarter],sales, row_no)),
CTE AS(
SELECT Y.Year,
Q.Quarter,
V.sales,
V.row_no,
COUNT(CASE WHEN V.sales IS NOT NULL THEN 1 END) OVER (ORDER BY Y.[Year], Q.[Quarter]
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
FROM (VALUES(2018),(2019)) Y([Year])
CROSS JOIN (VALUES(1),(2),(3),(4)) Q([Quarter])
LEFT JOIN VTE V ON Y.[Year] = V.[Year] AND Q.[Quarter] = V.[Quarter])
SELECT C.[Year],
C.[Quarter],
MAX(C.sales) OVER (PARTITION BY C.Grp) AS Sales
FROM CTE C;
这仅适用于SQL Server 2012+(因为ROWS BETWEEN
是SQL Server 2012中引入的),但是,希望您不使用2008,因为它们(几乎)完全不受支持。
答案 2 :(得分:1)
我只会做JOIN
:
SELECT TT.YEAR, TT.Quarter, COALESCE(T.SALES, MAX(T.SALES) OVER (PARTITION BY TT.YEAR)) AS sales
FROM (SELECT DISTINCT T.YEAR, TT.Quarter
FROM [MyTable] T CROSS JOIN
( SELECT DISTINCT TT.Quarter FROM [MyTable] TT ) TT
) TT LEFT JOIN
[MyTable] T
ON TT.YEAR = T.YEAR AND TT.Quarter = T.Quarter;
编辑::我只是误解了另外quarter
个问题,因此,您需要在APPLY
JOIN中使用OUTER
:
SELECT TT.YEAR, TT.Quarter, COALESCE(T.SALES, T1.SALES) AS Sales
FROM (SELECT DISTINCT T.YEAR, TT.Quarter
FROM [MyTable] T CROSS JOIN
( SELECT DISTINCT TT.Quarter FROM [MyTable] TT ) TT
) TT LEFT JOIN
[MyTable] T
ON TT.YEAR = T.YEAR AND TT.Quarter = T.Quarter OUTER APPLY
( SELECT TOP (1) T.*
FROM [MyTable] T
WHERE T.YEAR = TT.YEAR
ORDER BY T.Quarter DESC
) T1;