UBER CRM案例研究SQL中的客户保留指标

时间:2019-11-28 18:30:43

标签: sql sql-server analytics

背景:我想在某个 每周滚动一次,以便如果骑手28天未骑车,他们可以采取必要的干预措施。

链接:Problem in detail

以下是我要在单个查询中实现的指标列表

输出查询(单个查询)中列的定义:

  1. 日期:将根据以下度量标准计算日期。

  2. city_id:城市的ID

  3. dau:在该日期完成至少一趟的不同骑手人数。

  4. wau:相对于“日期”

  5. 列中的日期,在过去7天内完成了至少一次旅行的不同骑手人数
  6. new_rider:相对于“日期”列中的日期,在过去7天内使用first trip的不同骑手的数量

  7. previous_mau:相对于“日期”列中的日期,在过去56天至过去29天之间至少进行了一次旅行的骑手人数统计

  8. mau_28:相对于“日期”列中的日期,在过去28天内至少完成了一次旅行的骑手的数量

  9. 保留:previous_mau和mau_28阶段中不同骑手的交集

  10. 复活:在上一个_mau阶段不活跃但在mau_28阶段活跃的独特骑手的数量。

  11. 用户流失:在上一个_mau阶段活跃但在 mau_28相。

有效:如果骑手在相应时期内至少完成了一次旅行 无效:如果骑手在相应的时间段内没有单程出行

以下是我尝试过的内容:

create table Tripdata
(
  [date] date,
  rider_id int,
  trip_id int,
  city_id int,
  status varchar(100)
)
go

查询要插入值

INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 348, 1, 8, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1729, 2, 5, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5265, 3, 4, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2098, 4, 4, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 4942, 5, 8, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5424, 6, 11, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 4269, 7, 7, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5649, 8, 1, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2385, 9, 6, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5161, 10, 8, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 571, 11, 8, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5072, 12, 9, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1233, 13, 5, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2490, 14, 5, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5665, 15, 9, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1400, 16, 2, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 3324, 17, 4, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2533, 18, 13, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5314, 19, 11, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 4773, 20, 12, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5544, 21, 2, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1232, 22, 5, N'completed')
GO

以下是我到目前为止所获得/获得的:(贷方为LukStorms

   SELECT *
    FROM 
    (
        SELECT  [datee], city_id, 
        COUNT(DISTINCT rider_id) AS [dau]
        FROM [dbo].[Tripdata]
        GROUP BY [datee], city_id
    ) t
    OUTER APPLY
    (
       SELECT
       COUNT(rider_id) AS [wau],
       COUNT(CASE WHEN [rides]=1 THEN rider_id END) AS [new_rider]
       FROM
       (
          SELECT t2.city_id, t2.rider_id,
           COUNT(*) AS [rides]
          FROM [dbo].[Tripdata] t2
          WHERE t2.city_id = t.city_id
            AND t2.[datee] <= t.[datee]
          AND t2.[datee]>=dateadd(day,-7,t.[datee])
          GROUP BY t2.city_id, t2.rider_id
       ) q
       GROUP BY city_id
    ) last7
    OUTER APPLY
    (
        SELECT 
         COUNT(DISTINCT t2.rider_id) AS [previous_mau]
        FROM [dbo].[Tripdata] t2
        WHERE t2.city_id = t.city_id
          AND t2.[datee] <= dateadd(day,-29,t.[datee])
          AND t2.[datee] >= dateadd(day,-56,t.[datee])
    ) prev29

    ORDER BY t.[datee], t.city_id;

如何在单个查询中实现上述查询的所有结果?以及如何编写一个查询以在单个查询输出中回答上述7、8、9、10个问题?

此外,在将骑手映射到特定城市时,它们也是一些特殊的考虑因素

一个骑手可能从多个城市旅行,这可能导致计算骑手活跃 或在多个城市不活跃。因此,要解决此问题,需要将骑手映射到 仅一个城市。骑手应该被映射到他们从哪个城市出发 仅考虑最近20次旅行的最大旅行次数。 2.对于与城市有关的所有计算,重要的是要考虑映射到骑手的城市 而不是旅行发生的城市。 3.我们的数据库系统没有标准模式功能,因此车手城市地图 需要导出。

2 个答案:

答案 0 :(得分:2)

这就是我要怎么做->

SQL Fiddle

MS SQL Server 2017架构设置

create table TripData
(
  [date] date,
  rider_id int,
  trip_id int,
  city_id int,
  status varchar(100)
)
go
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 348, 1, 8, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1729, 2, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5265, 3, 4, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2098, 4, 4, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 4942, 5, 8, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5424, 6, 11, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 4269, 7, 7, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5649, 8, 1, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2385, 9, 6, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5161, 10, 8, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 571, 11, 8, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5072, 12, 9, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1233, 13, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2490, 14, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5665, 15, 9, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1400, 16, 2, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 3324, 17, 4, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2533, 18, 13, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5314, 19, 11, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 4773, 20, 12, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5544, 21, 2, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1232, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 111, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-05-01T00:00:00.000' AS DateTime), 111, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-05-28T00:00:00.000' AS DateTime), 111, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-03-28T00:00:00.000' AS DateTime), 111, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-01-28T00:00:00.000' AS DateTime), 111, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-05-15T00:00:00.000' AS DateTime), 222, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 222, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 333, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-05-01T00:00:00.000' AS DateTime), 333, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-05-15T00:00:00.000' AS DateTime), 222, 22, 10, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 222, 22, 10, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 333, 22, 10, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-05-01T00:00:00.000' AS DateTime), 333, 22, 10, N'completed')
GO

INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 3333, 22, 10, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-05-01T00:00:00.000' AS DateTime), 3333, 22, 10, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1112, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-05-01T00:00:00.000' AS DateTime), 1112, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-05-28T00:00:00.000' AS DateTime), 1112, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-03-28T00:00:00.000' AS DateTime), 1112, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-01-28T00:00:00.000' AS DateTime), 1112, 22, 5, N'completed')

查询1

;WITH AdddedIndicators AS
(
   /*For every record, calculate the key metrics needed to aggerate up.
     days_back_last_ride can make use of LAG() by rider and city ordered by date or null if no value,
     ISNULL the result to bring it back to 0 meaning no days back(first ride).
     Days_back_last_ride=0 could be used to determine first ride date, however, that would ot fit the between 1..7 rule
     so we need a first_ride_date. Again, using a window function by rider and city, grab the min date*/
   SELECT 
      td.date, rider_id, city_id,
      days_back_last_ride = ISNULL(DATEDIFF(DAY,LAG(date) OVER(PARTITION BY rider_id,city_id ORDER BY date),td.date),0),
      first_ride_date =  MIN(date) OVER (PARTITION BY rider_id, city_id)
  FROM    
      TripData td
  )
,Normalized AS
(
    /*The need metrics have been calculated above  for the bulk of your calcs with the data, query it and 
    build up, aggregates up, flags for each rider/city/date so we can ultimatley formulate this for each rider/city
    since this is by rider, city and date the user will allocate points to every city
    vistited in a given day*/
    SELECT 
        date, city_id, rider_id,
        dau= COUNT(DISTINCT rider_id),
        wau_flag = SUM(CASE WHEN  days_back_last_ride BETWEEN 1 AND 7 THEN 1 ELSE 0 END),
        new_rider_flag = SUM(CASE WHEN DATEDIFF(DAY,first_ride_date,date) <= 7 THEN 1 ELSE 0 END),
        previous_mau_flag = SUM(CASE WHEN  days_back_last_ride BETWEEN 29 AND 56 THEN 1 ELSE 0 END),
        mau_28_flag = SUM(CASE WHEN days_back_last_ride BETWEEN 1 AND 28 THEN 1 ELSE 0 END),
        retained = CASE WHEN SUM(CASE WHEN days_back_last_ride BETWEEN 1 AND 28 THEN 1 ELSE 0 END) > 1 
                   AND   
                   SUM(CASE WHEN  days_back_last_ride BETWEEN 29 AND 56 THEN 1 ELSE 0 END) > 1 THEN 1 ELSE 0 END
    FROM 
        AdddedIndicators
    GROUP BY
        city_id, date, rider_id
)
SELECT 
    /* Finalize the results by date and city
    The flags have been made by user, city and date above. 
    So gather each data piont and sum them up based on the rule set */
    date, city_id,
    dau = SUM(dau),
    wau_flag = SUM(CASE WHEN wau_flag  >= 1 THEN 1 ELSE 0 END),
    retained = SUM(CASE WHEN previous_mau_flag >= 1 AND mau_28_flag >= 1 THEN 1 ELSE 0 END),
    resurrect = SUM(CASE WHEN previous_mau_flag = 0 AND mau_28_flag >= 1 THEN 1 ELSE 0 END),
    churn  = SUM(CASE WHEN previous_mau_flag >= 1 AND mau_28_flag = 0 THEN 1 ELSE 0 END)
FROM 
    Normalized
GROUP BY
    date, city_id   

Results

|       date | city_id | dau | wau_flag | retained | resurrect | churn |
|------------|---------|-----|----------|----------|-----------|-------|
| 2019-06-01 |       1 |   1 |        0 |        0 |         0 |     0 |
| 2019-06-01 |       2 |   2 |        0 |        0 |         0 |     0 |
| 2019-06-01 |       4 |   3 |        0 |        0 |         0 |     0 |
| 2019-01-28 |       5 |   2 |        0 |        0 |         0 |     0 |
| 2019-03-28 |       5 |   2 |        0 |        0 |         0 |     0 |
| 2019-05-01 |       5 |   3 |        0 |        0 |         0 |     2 |
| 2019-05-15 |       5 |   1 |        0 |        0 |         0 |     0 |
| 2019-05-28 |       5 |   2 |        0 |        0 |         2 |     0 |
| 2019-06-01 |       5 |   8 |        2 |        0 |         3 |     1 |
| 2019-06-01 |       6 |   1 |        0 |        0 |         0 |     0 |
| 2019-06-01 |       7 |   1 |        0 |        0 |         0 |     0 |
| 2019-06-01 |       8 |   4 |        0 |        0 |         0 |     0 |
| 2019-06-01 |       9 |   2 |        0 |        0 |         0 |     0 |
| 2019-05-01 |      10 |   2 |        0 |        0 |         0 |     0 |
| 2019-05-15 |      10 |   1 |        0 |        0 |         0 |     0 |
| 2019-06-01 |      10 |   3 |        0 |        0 |         1 |     2 |
| 2019-06-01 |      11 |   2 |        0 |        0 |         0 |     0 |
| 2019-06-01 |      12 |   1 |        0 |        0 |         0 |     0 |
| 2019-06-01 |      13 |   1 |        0 |        0 |         0 |     0 |

答案 1 :(得分:1)

尝试一下:

select [date], city_id
,(select count(distinct [rider_id])  from #Tripdata b where b.[date] = a.[date] and a.city_id = b.city_id) as [dau]
,(select count(distinct [rider_id])  from #Tripdata b where b.[date] between dateadd(day, -7, a.[date]) and a.[date] and a.city_id = b.city_id) as [wau]

,(select count(distinct [rider_id])  from #Tripdata b where a.city_id = b.city_id 
    and b.[rider_id] NOT IN  (Select c.[rider_id] from #Tripdata c where c.[date] < dateadd(day, -7, a.[date])) 
    and b.[rider_id] NOT IN  (Select c.[rider_id] from #Tripdata c where c.[date] > a.[date]) 
) as [new_rider]

,(select count(distinct [rider_id])  from #Tripdata b where a.city_id = b.city_id 
    and b.[rider_id] NOT IN  (Select c.[rider_id] from #Tripdata c where c.[date] < dateadd(day, -56, a.[date])) 
    and b.[rider_id] NOT IN  (Select c.[rider_id] from #Tripdata c where c.[date] > dateadd(day, -29, a.[date])) 
) as [previous_mau]

from #Tripdata a
group by  [date], city_id