背景:我想在某个 每周滚动一次,以便如果骑手28天未骑车,他们可以采取必要的干预措施。
以下是我要在单个查询中实现的指标列表
输出查询(单个查询)中列的定义:
日期:将根据以下度量标准计算日期。
city_id:城市的ID
dau:在该日期完成至少一趟的不同骑手人数。
wau:相对于“日期”
new_rider:相对于“日期”列中的日期,在过去7天内使用first trip
的不同骑手的数量
previous_mau:相对于“日期”列中的日期,在过去56天至过去29天之间至少进行了一次旅行的骑手人数统计
mau_28:相对于“日期”列中的日期,在过去28天内至少完成了一次旅行的骑手的数量
保留:previous_mau和mau_28阶段中不同骑手的交集
复活:在上一个_mau阶段不活跃但在mau_28阶段活跃的独特骑手的数量。
用户流失:在上一个_mau阶段活跃但在 mau_28相。
有效:如果骑手在相应时期内至少完成了一次旅行 无效:如果骑手在相应的时间段内没有单程出行
以下是我尝试过的内容:
create table Tripdata
(
[date] date,
rider_id int,
trip_id int,
city_id int,
status varchar(100)
)
go
查询要插入值
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 348, 1, 8, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1729, 2, 5, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5265, 3, 4, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2098, 4, 4, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 4942, 5, 8, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5424, 6, 11, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 4269, 7, 7, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5649, 8, 1, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2385, 9, 6, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5161, 10, 8, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 571, 11, 8, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5072, 12, 9, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1233, 13, 5, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2490, 14, 5, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5665, 15, 9, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1400, 16, 2, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 3324, 17, 4, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2533, 18, 13, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5314, 19, 11, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 4773, 20, 12, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5544, 21, 2, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1232, 22, 5, N'completed')
GO
以下是我到目前为止所获得/获得的:(贷方为LukStorms)
SELECT *
FROM
(
SELECT [datee], city_id,
COUNT(DISTINCT rider_id) AS [dau]
FROM [dbo].[Tripdata]
GROUP BY [datee], city_id
) t
OUTER APPLY
(
SELECT
COUNT(rider_id) AS [wau],
COUNT(CASE WHEN [rides]=1 THEN rider_id END) AS [new_rider]
FROM
(
SELECT t2.city_id, t2.rider_id,
COUNT(*) AS [rides]
FROM [dbo].[Tripdata] t2
WHERE t2.city_id = t.city_id
AND t2.[datee] <= t.[datee]
AND t2.[datee]>=dateadd(day,-7,t.[datee])
GROUP BY t2.city_id, t2.rider_id
) q
GROUP BY city_id
) last7
OUTER APPLY
(
SELECT
COUNT(DISTINCT t2.rider_id) AS [previous_mau]
FROM [dbo].[Tripdata] t2
WHERE t2.city_id = t.city_id
AND t2.[datee] <= dateadd(day,-29,t.[datee])
AND t2.[datee] >= dateadd(day,-56,t.[datee])
) prev29
ORDER BY t.[datee], t.city_id;
如何在单个查询中实现上述查询的所有结果?以及如何编写一个查询以在单个查询输出中回答上述7、8、9、10个问题?
此外,在将骑手映射到特定城市时,它们也是一些特殊的考虑因素
一个骑手可能从多个城市旅行,这可能导致计算骑手活跃 或在多个城市不活跃。因此,要解决此问题,需要将骑手映射到 仅一个城市。骑手应该被映射到他们从哪个城市出发 仅考虑最近20次旅行的最大旅行次数。 2.对于与城市有关的所有计算,重要的是要考虑映射到骑手的城市 而不是旅行发生的城市。 3.我们的数据库系统没有标准模式功能,因此车手城市地图 需要导出。
答案 0 :(得分:2)
这就是我要怎么做->
MS SQL Server 2017架构设置:
create table TripData
(
[date] date,
rider_id int,
trip_id int,
city_id int,
status varchar(100)
)
go
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 348, 1, 8, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1729, 2, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5265, 3, 4, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2098, 4, 4, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 4942, 5, 8, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5424, 6, 11, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 4269, 7, 7, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5649, 8, 1, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2385, 9, 6, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5161, 10, 8, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 571, 11, 8, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5072, 12, 9, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1233, 13, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2490, 14, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5665, 15, 9, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1400, 16, 2, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 3324, 17, 4, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2533, 18, 13, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5314, 19, 11, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 4773, 20, 12, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5544, 21, 2, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1232, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 111, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-05-01T00:00:00.000' AS DateTime), 111, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-05-28T00:00:00.000' AS DateTime), 111, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-03-28T00:00:00.000' AS DateTime), 111, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-01-28T00:00:00.000' AS DateTime), 111, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-05-15T00:00:00.000' AS DateTime), 222, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 222, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 333, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-05-01T00:00:00.000' AS DateTime), 333, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-05-15T00:00:00.000' AS DateTime), 222, 22, 10, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 222, 22, 10, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 333, 22, 10, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-05-01T00:00:00.000' AS DateTime), 333, 22, 10, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 3333, 22, 10, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-05-01T00:00:00.000' AS DateTime), 3333, 22, 10, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1112, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-05-01T00:00:00.000' AS DateTime), 1112, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-05-28T00:00:00.000' AS DateTime), 1112, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-03-28T00:00:00.000' AS DateTime), 1112, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-01-28T00:00:00.000' AS DateTime), 1112, 22, 5, N'completed')
查询1 :
;WITH AdddedIndicators AS
(
/*For every record, calculate the key metrics needed to aggerate up.
days_back_last_ride can make use of LAG() by rider and city ordered by date or null if no value,
ISNULL the result to bring it back to 0 meaning no days back(first ride).
Days_back_last_ride=0 could be used to determine first ride date, however, that would ot fit the between 1..7 rule
so we need a first_ride_date. Again, using a window function by rider and city, grab the min date*/
SELECT
td.date, rider_id, city_id,
days_back_last_ride = ISNULL(DATEDIFF(DAY,LAG(date) OVER(PARTITION BY rider_id,city_id ORDER BY date),td.date),0),
first_ride_date = MIN(date) OVER (PARTITION BY rider_id, city_id)
FROM
TripData td
)
,Normalized AS
(
/*The need metrics have been calculated above for the bulk of your calcs with the data, query it and
build up, aggregates up, flags for each rider/city/date so we can ultimatley formulate this for each rider/city
since this is by rider, city and date the user will allocate points to every city
vistited in a given day*/
SELECT
date, city_id, rider_id,
dau= COUNT(DISTINCT rider_id),
wau_flag = SUM(CASE WHEN days_back_last_ride BETWEEN 1 AND 7 THEN 1 ELSE 0 END),
new_rider_flag = SUM(CASE WHEN DATEDIFF(DAY,first_ride_date,date) <= 7 THEN 1 ELSE 0 END),
previous_mau_flag = SUM(CASE WHEN days_back_last_ride BETWEEN 29 AND 56 THEN 1 ELSE 0 END),
mau_28_flag = SUM(CASE WHEN days_back_last_ride BETWEEN 1 AND 28 THEN 1 ELSE 0 END),
retained = CASE WHEN SUM(CASE WHEN days_back_last_ride BETWEEN 1 AND 28 THEN 1 ELSE 0 END) > 1
AND
SUM(CASE WHEN days_back_last_ride BETWEEN 29 AND 56 THEN 1 ELSE 0 END) > 1 THEN 1 ELSE 0 END
FROM
AdddedIndicators
GROUP BY
city_id, date, rider_id
)
SELECT
/* Finalize the results by date and city
The flags have been made by user, city and date above.
So gather each data piont and sum them up based on the rule set */
date, city_id,
dau = SUM(dau),
wau_flag = SUM(CASE WHEN wau_flag >= 1 THEN 1 ELSE 0 END),
retained = SUM(CASE WHEN previous_mau_flag >= 1 AND mau_28_flag >= 1 THEN 1 ELSE 0 END),
resurrect = SUM(CASE WHEN previous_mau_flag = 0 AND mau_28_flag >= 1 THEN 1 ELSE 0 END),
churn = SUM(CASE WHEN previous_mau_flag >= 1 AND mau_28_flag = 0 THEN 1 ELSE 0 END)
FROM
Normalized
GROUP BY
date, city_id
Results :
| date | city_id | dau | wau_flag | retained | resurrect | churn |
|------------|---------|-----|----------|----------|-----------|-------|
| 2019-06-01 | 1 | 1 | 0 | 0 | 0 | 0 |
| 2019-06-01 | 2 | 2 | 0 | 0 | 0 | 0 |
| 2019-06-01 | 4 | 3 | 0 | 0 | 0 | 0 |
| 2019-01-28 | 5 | 2 | 0 | 0 | 0 | 0 |
| 2019-03-28 | 5 | 2 | 0 | 0 | 0 | 0 |
| 2019-05-01 | 5 | 3 | 0 | 0 | 0 | 2 |
| 2019-05-15 | 5 | 1 | 0 | 0 | 0 | 0 |
| 2019-05-28 | 5 | 2 | 0 | 0 | 2 | 0 |
| 2019-06-01 | 5 | 8 | 2 | 0 | 3 | 1 |
| 2019-06-01 | 6 | 1 | 0 | 0 | 0 | 0 |
| 2019-06-01 | 7 | 1 | 0 | 0 | 0 | 0 |
| 2019-06-01 | 8 | 4 | 0 | 0 | 0 | 0 |
| 2019-06-01 | 9 | 2 | 0 | 0 | 0 | 0 |
| 2019-05-01 | 10 | 2 | 0 | 0 | 0 | 0 |
| 2019-05-15 | 10 | 1 | 0 | 0 | 0 | 0 |
| 2019-06-01 | 10 | 3 | 0 | 0 | 1 | 2 |
| 2019-06-01 | 11 | 2 | 0 | 0 | 0 | 0 |
| 2019-06-01 | 12 | 1 | 0 | 0 | 0 | 0 |
| 2019-06-01 | 13 | 1 | 0 | 0 | 0 | 0 |
答案 1 :(得分:1)
尝试一下:
select [date], city_id
,(select count(distinct [rider_id]) from #Tripdata b where b.[date] = a.[date] and a.city_id = b.city_id) as [dau]
,(select count(distinct [rider_id]) from #Tripdata b where b.[date] between dateadd(day, -7, a.[date]) and a.[date] and a.city_id = b.city_id) as [wau]
,(select count(distinct [rider_id]) from #Tripdata b where a.city_id = b.city_id
and b.[rider_id] NOT IN (Select c.[rider_id] from #Tripdata c where c.[date] < dateadd(day, -7, a.[date]))
and b.[rider_id] NOT IN (Select c.[rider_id] from #Tripdata c where c.[date] > a.[date])
) as [new_rider]
,(select count(distinct [rider_id]) from #Tripdata b where a.city_id = b.city_id
and b.[rider_id] NOT IN (Select c.[rider_id] from #Tripdata c where c.[date] < dateadd(day, -56, a.[date]))
and b.[rider_id] NOT IN (Select c.[rider_id] from #Tripdata c where c.[date] > dateadd(day, -29, a.[date]))
) as [previous_mau]
from #Tripdata a
group by [date], city_id