我在PostgreSQL 10.5中有一张表Subscriptions:
id user_id starts_at ends_at
--------------------------------
1 233 02/04/19 03/03/19
2 233 03/04/19 04/03/19
3 296 02/09/19 03/08/19
4 126 02/01/19 02/28/19
5 126 03/01/19 03/31/19
6 922 02/22/19 03/22/19
我想每周计算一下我们有多少新订户。新订户将是该周之前没有订阅条目的任何用户ID。
编辑我已经对@fubar解决方案进行了一些修改,以适应我希望使用的日期格式。我忘记在此处添加的一个澄清之处是,我想看看有0
的几周。如何将generate_series
集成到下面的查询中,这样我也可以看到0
订户的周数?
SELECT TO_CHAR(date_trunc('week', s.starts_at), 'YYYY-MM-DD') as week, COUNT(*) AS count
FROM subscriptions s
LEFT JOIN subscriptions s1 ON s.user_id = s1.user_id AND s.starts_at > s1.starts_at
WHERE s1.id IS NULL
GROUP BY week
ORDER BY week desc
答案 0 :(得分:3)
您可以通过以下查询找到每个用户的第一个订阅:
SELECT s.*
FROM subscriptions s
LEFT JOIN subscriptions s1 ON s.user_id = s1.user_id AND s.starts_at > s1.starts_at
WHERE s1.id IS NULL
然后您可以使用以下查询来统计每年/每周的新订户数量:
SELECT
EXTRACT(YEAR FROM s.starts_at) AS year,
EXTRACT(WEEK FROM s.starts_at) AS week,
COUNT(*) AS count
FROM subscriptions s
LEFT JOIN subscriptions s1 ON s.user_id = s1.user_id AND s.starts_at > s1.starts_at
WHERE s1.id IS NULL
GROUP BY year, week;
以下是更新后的查询,其中将我上面的答案与generate_series()
和您首选的星期日期格式结合在一起。
SELECT
TO_CHAR(date_trunc('week', w.date), 'YYYY-MM-DD') AS week,
COUNT(DISTINCT s.*) AS count
FROM generate_series('2018-12-31', NOW(), INTERVAL '1 WEEK') w(date)
LEFT JOIN subscriptions s ON s.starts_at BETWEEN w.date AND w.date + INTERVAL '6 DAY'
LEFT JOIN subscriptions s1 ON s.user_id = s1.user_id AND s.starts_at > s1.starts_at
WHERE s1.id IS NULL
GROUP BY w.date;
答案 1 :(得分:0)
我+1 fubar的解决方案。它适用于所有RDBMS。
我将提供另一种方法,由于DISTINCT ON
用户首次订阅的查找日期:
select
distinct on (s.user_id)
s.*
from subscriptions s
order by s.user_id, s.starts_at;
输出:
| id | user_id | starts_at | ends_at |
| --- | ------- | ------------------------ | ------------------------ |
| 4 | 126 | 2019-02-01T00:00:00.000Z | 2019-02-28T00:00:00.000Z |
| 1 | 233 | 2019-01-04T00:00:00.000Z | 2019-03-03T00:00:00.000Z |
| 3 | 296 | 2019-02-09T00:00:00.000Z | 2019-03-08T00:00:00.000Z |
| 6 | 922 | 2019-02-22T00:00:00.000Z | 2019-03-22T00:00:00.000Z |
架构
CREATE TABLE subscriptions (
id INT NOT NULL,
user_id INT NOT NULL,
starts_at DATE,
ends_at DATE,
PRIMARY KEY(id)
);
INSERT INTO subscriptions VALUES
(1, 233, '2019-01-04', '2019-03-03'),
(2, 233, '2019-03-04', '2019-04-04'),
(3, 296, '2019-02-09', '2019-03-08'),
(4, 126, '2019-02-01', '2019-02-28'),
(5, 126, '2019-03-01', '2019-03-31'),
(6, 922, '2019-02-22', '2019-03-22');
获取每周新订阅者的数量
实时测试:https://www.db-fiddle.com/f/vhzw4KvANA6Mvi59NDTy3H/0
with first_time
as
(
select
distinct on (s.user_id)
s.*
from subscriptions s
order by s.user_id, s.starts_at
)
select gs.wk, count(ft.*) as new_subscribers_for_the_week
from
generate_series('2019-02-25'::date, now()::date, interval '1 week') gs(wk)
left join first_time ft
on gs.wk >= ft.starts_at and gs.wk <= ft.ends_at
group by gs.wk
order by gs.wk;
输出:
| wk | new_subscribers_for_the_week |
| ------------------------ | ---------------------------- |
| 2019-02-25T00:00:00.000Z | 4 |
| 2019-03-04T00:00:00.000Z | 2 |
| 2019-03-11T00:00:00.000Z | 1 |
| 2019-03-18T00:00:00.000Z | 1 |
| 2019-03-25T00:00:00.000Z | 0 |
| 2019-04-01T00:00:00.000Z | 0 |
| 2019-04-08T00:00:00.000Z | 0 |