user_id product_type reservation_date used_date
|12345 | A | 2016-06-01 | 2016-06-24 |
|12345 | B | 2016-06-03 | 2016-06-24 |
|12345 | C | 2016-07-02 | 2016-07-30 |
|12346 | A | 2016-06-27 | 2016-07-24 |
|12346 | B | 2016-06-29 | 2016-07-22 |
我想在我们的平台上找出“交叉销售”效应。
在上表中,user_id
12345
已在一个月(一天)内购买了product_type
A
,B
和C
。
我想计算购买任何类型产品的用户数量,但在reservation_date
的30天内至少有两种不同的类型。
有没有办法做到这一点?我写了一个像下面这样的查询,但发现这是不准确的,因为我不能用我希望看到输出的正确条件来计算日期。
SELECT
DATE_TRUNC('month', reservation.date),
COUNT(DISTINCT(user.id)),
FROM reservation
INNER JOIN products ON products.id = reservation.product_id
INNER JOIN users ON users.id = reservation.user_id
WHERE products.type = 'A'
AND user.id IN(
SELECT user.id
FROM reservation
INNER JOIN products ON products.id = reservation.product_id
INNER JOIN users ON users.id = reservation.user_id
WHERE product.type in ('B','C')
GROUP BY 1,2 ORDER BY 1 DESC;
答案 0 :(得分:0)
基本查询可能如下所示(假设您要将每个月视为时间范围):
SELECT user_id, date_trunc('month', reservation_date)
, count(DISTINCT product_type) AS ct
FROM reservation
GROUP BY 1,2
HAVING count(DISTINCT product_type) > 1
ORDER BY 1 DESC;
获取合格用户的实际数量:
SELECT count(DISTINCT user_id)
FROM (
SELECT user_id
FROM reservation
GROUP BY user_id, date_trunc('month', reservation_date)
HAVING count(DISTINCT product_type) > 1
) sub;
根据your comment:
...计算每月进行预订的用户数量(至少2个 产品类型)每次预订30天之间。所以,如果我 我在7月份为产品A预订,在产品B预订了15天 不计入该月数。
SELECT date_trunc('month', reservation_date), count (DISTINCT user_id) AS ct_users
FROM reservation r
WHERE EXISTS (
SELECT 1
FROM reservation
WHERE user_id = r.user_id
AND reservation_date <= r. reservation_date - 30 -- assuming data type date!
AND product_type <> r.product_type
)
GROUP BY 1;
这将返回每个月的用户数量,而不是至少提前30天使用不同类型的产品进行预订。
在较长一段时间内,记住之前合格的用户并仅测试其余用户会更有效。
答案 1 :(得分:0)
可能是这样的工作
SELECT COUNT(DISTINCT(r.user_id))
FROM reservation r
INNER JOIN reservation r_a
ON r_a.user_id = r.user_id
AND r_a.product_type <> r.product_type
AND @extract(day FROM (r_a.reservation_date::TIMESTAMP - r.reservation_date::TIMESTAMP)) <= 30
@extract(timepart from (one_timestamp - another_timestamp)
将等于两个时间戳之间差异的“timeparts”的绝对值