我有一个庞大的用户表(作为guid),一些关联的值,以及插入每行的时间戳。用户可能与此表中的许多行相关联。
guid | <other columns> | insertdate
我想计算每个月:插入了多少个唯一的新用户。手动操作很容易:
select count(distinct guid)
from table
where insertdate >= '20060201' and insertdate < '20060301'
and guid not in (select guid from table where
insertdate >= '20060101' and insertdate < '20060201')
如何在sql中连续每个月完成一次?
我想用排名功能清楚地将每个guid与一个月联系起来:
select guid,
,dense_rank() over ( order by datepart(YYYY, insertdate),
datepart(m, t.TransactionDateTime)) as MonthRank
from table
然后迭代每个等级值:
declare @no_times int
declare @counter int = 1
set @no_times = select count(distinct concat(datepart(year, t.TransactionDateTime),
datepart(month, t.TransactionDateTime))) from table
while @no_times > 0 do
(
select count(*), @counter
where guid not in (select guid from table where rank = @counter)
and rank = @int + 1
@counter += 1
@no_times -= 1
union all
)
end
我知道这种策略可能是错误的做事方式。
理想情况下,我希望结果集看起来像这样:
MonthRank | NoNewUsers
如果一个sql向导能指出我正确的方向,我会非常感兴趣和感激。
答案 0 :(得分:0)
SELECT
DATEPART(year,t.insertdate) AS YearNum
,DATEPART(mm,t.insertdate) as MonthNum
,COUNT(DISTINCT guid) AS NoNewUsers
,DENSE_RANK() OVER (ORDER BY COUNT(DISTINCT t.guid) DESC) AS MonthRank
FROM
table t
LEFT JOIN table t2
ON t.guid = t2.guid
AND t.insertdate > t2.insertdate
WHERE
t2.guid IS NULL
GROUP BY
DATEPART(year,t.insertdate)
,DATEPART(mm,t.insertdate)
使用左联接来查看该表是否曾作为先前的插入日期存在,如果他们没有,那么就像通常那样使用聚合来计算它们。如果你想添加一个排名来查看哪个月的新用户数最多,那么你可以使用你的DENSE_RANK()函数,但因为你已经想要分组,你想要的不需要分区子句。
答案 1 :(得分:0)
如果您想要输入guid
的第一次时间,那么您的查询并不完全正常。您可以第一次使用两个聚合:
select year(first_insertdate), month(first_insertdate), count(*)
from (select t.guid, min(insertdate) as first_insertdate
from t
group by t.guid
) t
group by year(first_insertdate), month(first_insertdate)
order by year(first_insertdate), month(first_insertdate);
如果您希望每次跳过一个月时计算guid
,那么您可以使用lag()
:
select year(insertdate), month(insertdate), count(*)
from (select t.*,
lag(insertdate) over (partition by guid order by insertdate) as prev_insertdate
from t
) t
where prev_insertdate is null or
datediff(month, prev_insertdate, insertdate) >= 2
group by year(insertdate), month(insertdate)
order by year(insertdate), month(insertdate);
答案 2 :(得分:0)
我用可怕的while循环解决了它,然后一位朋友帮助我以另一种方式更有效地解决它。
循环版本:
--ranked by month
select t.TransactionID
,t.BuyerUserID
,concat(datepart(year, t.InsertDate), datepart(month,
t.InsertDate)) MonthRankName
,dense_rank() over ( order by datepart(YYYY, t.InsertDate),
datepart(m, t.InsertDate)) as MonthRank
into #ranked
from table t;
--iteratate
declare @counter int = 1
declare @no_times int
select @no_times = count(distinct concat(datepart(year, t.InsertDate),
datepart(month, t.InsertDate))) from table t;
select count(distinct r.guid) as NewUnique, r.Monthrank into #results
from #ranked r
where r.MonthRank = 1 group by r.MonthRank;
while @no_times > 1
begin
insert into #results
select count(distinct rt.guid) as NewUnique, @counter + 1 as MonthRank
from #ranked r
where rt.guid not in
(
select rt2.guid from #ranked rt2
where rt2.MonthRank = @counter
)
and rt.MonthRank = @counter + 1
set @counter = @counter+1
set @no_times = @no_times-1
end
select * from #results r
结果运行得非常慢(正如您所料)
这个方法的结果是速度提高了10倍:
select t.guid,
cast (concat(datepart(year, min(t.InsertDate)),
case when datepart(month, min(t.InsertDate)) < 10 then
'0'+cast( datepart(month, min(t.InsertDate)) as varchar(10))
else cast (datepart(month, min(t.InsertDate)) as varchar(10)) end
) as int) as MonthRankName
into #NewUnique
from table t
group by t.guid;
select count(1) as NewUniques, t.MonthRankName from #NewUnique t
group by t.MonthRankName
order by t.MonthRankName
只需识别每个guid出现的第一个月,然后计算每个月发生的数量。通过一些简单的方法可以很好地使YearMonth格式化(这似乎比格式([date],&#39; yyyyMM&#39;)更有效,但需要对此进行更多实验。