尝试获取一个基本表,显示从一个月到下一个月的保留时间。因此,如果有人在上个月购买了某些东西,并且他们会在下个月购买,那么它就会被计算在内。
month, num_transactions, repeat_transactions, retention
2012-02, 5, 2, 40%
2012-03, 10, 3, 30%
2012-04, 15, 8, 53%
因此,如果上个月购买的每个人在下个月再次购买,那么您将获得100%。
到目前为止,我只能手动计算东西。这给了我两个月都见过的行:
select count(*) as num_repeat_buyers from
(select distinct
to_char(transaction.timestamp, 'YYYY-MM') as month,
auth_user.email
from
auth_user,
transaction
where
auth_user.id = transaction.buyer_id and
to_char(transaction.timestamp, 'YYYY-MM') = '2012-03'
) as table1,
(select distinct
to_char(transaction.timestamp, 'YYYY-MM') as month,
auth_user.email
from
auth_user,
transaction
where
auth_user.id = transaction.buyer_id and
to_char(transaction.timestamp, 'YYYY-MM') = '2012-04'
) as table2
where table1.email = table2.email
这不对,但我觉得我可以使用一些Postgres的窗口函数。请记住,窗口函数不允许您指定WHERE子句。您几乎可以访问前面的行和前面的行:
select month, count(*) as num_transactions, count(*) over (PARTITION BY month ORDER BY month)
from
(select distinct
to_char(transaction.timestamp, 'YYYY-MM') as month,
auth_user.email
from
auth_user,
transaction
where
auth_user.id = transaction.buyer_id
order by
month
) as transactions_by_month
group by
month
答案 0 :(得分:5)
给出以下测试表(您应该提供):
CREATE TEMP TABLE transaction (buyer_id int, tstamp timestamp);
INSERT INTO transaction VALUES
(1,'2012-01-03 20:00')
,(1,'2012-01-05 20:00')
,(1,'2012-01-07 20:00') -- multiple transactions this month
,(1,'2012-02-03 20:00') -- next month
,(1,'2012-03-05 20:00') -- next month
,(2,'2012-01-07 20:00')
,(2,'2012-03-07 20:00') -- not next month
,(3,'2012-01-07 20:00') -- just once
,(4,'2012-02-07 20:00'); -- just once
表auth_user
与问题无关
使用tstamp
作为列名,因为我不使用基类型作为标识符。
我将使用窗口函数lag()
来识别重复的买家。为了简短起见,我将聚合和窗口函数组合在一个查询级别中。请记住,在聚合函数之后应用了窗口函数。
WITH t AS (
SELECT buyer_id
,date_trunc('month', tstamp) AS month
,count(*) AS item_transactions
,lag(date_trunc('month', tstamp)) OVER (PARTITION BY buyer_id
ORDER BY date_trunc('month', tstamp))
= date_trunc('month', tstamp) - interval '1 month'
OR NULL AS repeat_transaction
FROM transaction
WHERE tstamp >= '2012-01-01'::date
AND tstamp < '2012-05-01'::date -- time range of interest.
GROUP BY 1, 2
)
SELECT month
,sum(item_transactions) AS num_trans
,count(*) AS num_buyers
,count(repeat_transaction) AS repeat_buyers
,round(
CASE WHEN sum(item_transactions) > 0
THEN count(repeat_transaction) / sum(item_transactions) * 100
ELSE 0
END, 2) AS buyer_retention
FROM t
GROUP BY 1
ORDER BY 1;
结果:
month | num_trans | num_buyers | repeat_buyers | buyer_retention_pct
---------+-----------+------------+---------------+--------------------
2012-01 | 5 | 3 | 0 | 0.00
2012-02 | 2 | 2 | 1 | 50.00
2012-03 | 2 | 2 | 1 | 50.00
我延长了你的问题,以提供交易数量和买家数量之间的差异。
OR NULL
repeat_transaction
用于将FALSE
转换为NULL
,因此下一步count()
不会计算这些值。
答案 1 :(得分:0)
这使用CASE
和EXISTS
来获取重复的交易:
SELECT
*,
CASE
WHEN num_transactions = 0
THEN 0
ELSE round(100.0 * repeat_transactions / num_transactions, 2)
END AS retention
FROM
(
SELECT
to_char(timestamp, 'YYYY-MM') AS month,
count(*) AS num_transactions,
sum(CASE
WHEN EXISTS (
SELECT 1
FROM transaction AS t
JOIN auth_user AS u
ON t.buyer_id = u.id
WHERE
date_trunc('month', transaction.timestamp)
+ interval '1 month'
= date_trunc('month', t.timestamp)
AND auth_user.email = u.email
)
THEN 1
ELSE 0
END) AS repeat_transactions
FROM
transaction
JOIN auth_user
ON transaction.buyer_id = auth_user.id
GROUP BY 1
) AS summary
ORDER BY 1;
编辑:再次阅读问题后,从减1个月更改为加1个月。我现在的理解是,如果有人在2012-02买东西,然后在2012-03再买东西,那么他或她在2012-02的交易就算作本月的保留。