我目前正在尝试改进SQL Server上的SQL查询。
我的工作表如下:
CAT_HISTORY
DATE ID CATEGORY
----------- ----------- -----------
20121201 A 1
20121201 A 1
20121201 B 1
20121201 C 2
20131201 A 2
20131201 B 4
20131201 C 3
20141201 A 3
20141201 B 2
20141201 B 2
20141201 C 1
我的目标是检索其类别的历史记录。 到目前为止,我这样做:
SELECT A.DATE
,COUNT(DISTINCT A.ID) AS NB_CLIENTS
,A.CATEGORY AS STARTING_CAT
,B.CATOGORY AS ENDING_CAT
FROM CAT_HISTORY A
INNER JOIN CAT_HISTORY B
ON (
A.ID= B.ID
AND
(
(
A.DATE = 20121201
AND B.DATE = 20131201
)
OR
(
A.DATE = 20131201
AND B.DATE = 20141201
)
WHERE A.DATE>= 20121201 AND B.DATE<= 20141201
GROUP BY A.DATE, A.CATEGORY,B.CATEGORY
ORDER BY A.DATE, A.CATEGORY,B.CATEGORY
结果是:
DATE_KEY STARTING_CAT ENDING_CAT NB_CLIENTS
----------- ----------- ----------- -----------
20121201 1 2 1
20121201 1 4 1
20121201 2 3 1
20131201 2 3 1
20131201 4 2 1
20131201 2 3 1
但问题是我有更多日期,我为每个日期添加一个OR(大约15个不同的日期),我有很多用户。这意味着查询有时需要15分钟才能获得结果。
我相信我对我的内部联盟感到残忍,并且可能有更优雅和有效的方法来获得预期的结果。
我的最终目标是让Sankey随着时间的推移看到从一个类别到另一个类别的演变,我需要在日期之间从一个类别移动到另一个类别的用户数量。
使用Gordon Linoff的答案,它运作良好,但正在计算重复
SELECT DISTINCT DATE, CATEGORY,NEXT_CATEGORY, COUNT(*) AS NB_CLIENTS
FROM (
SELECT DISTINCT CH.*, LEAD(CATEGORY) OVER (PARTITION BY CH.ID ORDER BY DATE) AS NEXT_CATEGORY
FROM CAT_HISTORY CH
) CH
WHERE NEXT_CATEGORY IS NOT NULL
GROUP BY DATE, CATEGORY,NEXT_CATEGORY
示例: 预期
DATE_KEY STARTING_CAT ENDING_CAT NB_CLIENTS
----------- ----------- ----------- -----------
20121201 1 2 1
20121201 1 4 1
20121201 2 3 1
20131201 2 3 1
20131201 4 2 1
20131201 2 3 1
使用您的解决方案:
DATE_KEY STARTING_CAT ENDING_CAT NB_CLIENTS
----------- ----------- ----------- -----------
20121201 1 1 1
20121201 1 2 1
20121201 1 4 1
20121201 2 3 1
20131201 2 3 1
20131201 4 2 1
20131201 2 3 1
20141201 2 2 1
上次修改:
我设法找到了解决方法:
SELECT DISTINCT DATE, CATEGORY,NEXT_CATEGORY, COUNT(*) AS NB_CLIENTS
FROM (
SELECT DISTINCT CH.*, LEAD(CATEGORY) OVER (PARTITION BY CH.ID ORDER BY DATE) AS NEXT_CATEGORY
FROM (SELECT DISTINCT * FROM CAT_HISTORY) CH
) CH
WHERE NEXT_CATEGORY IS NOT NULL
GROUP BY DATE, CATEGORY,NEXT_CATEGORY
答案 0 :(得分:0)
如果您想查看成对更改,请使用lead()
而不是固定日期。在SQL Server 2012+中,您可以执行以下操作:
select date, category, next_category, count(*)
from (select ch.*,
lead(category) over (partition by id order by date) as next_category
from cat_history ch
) ch
group by date, category, next_category;
在早期版本的SQL Server中,您可以将相似的逻辑与相关子查询或apply
一起使用。
答案 1 :(得分:0)
请检查此问题,我将date field
替换为datefield
。
declare @t table(datefield date , id varchar(10) , category int )
insert into @t values
(cast( '20121201' as date) , 'A', 1),
(cast( '20121201' as date) , 'B', 1),
(cast( '20121201' as date) , 'C', 2),
(cast( '20131201' as date) , 'A', 2),
(cast( '20131201' as date) , 'B', 4),
(cast( '20131201' as date) , 'C', 3),
(cast( '20141201' as date) , 'A', 3),
(cast( '20141201' as date) , 'B', 2),
(cast( '20141201' as date) , 'C', 1)
SELECT A.datefield
,COUNT(DISTINCT A.ID) AS NB_CLIENTS
,A.CATEGORY AS STARTING_CAT
,isnull(B.CATEGORY ,0) AS ENDING_CAT
FROM @T A
left JOIN @T B
ON
(
A.ID= B.ID AND
( b.datefield = dateadd( yy, 1 , a.datefield ) )
)
-- WHERE A.datefield>= '20121201' AND ( B.datefield<= '20141201' or B.datefield is null)
GROUP BY A.datefield, A.CATEGORY,B.CATEGORY
ORDER BY A.datefield, A.CATEGORY,B.CATEGORY