如何查找SQL间隔时间间隔的购买次数

时间:2015-02-15 20:40:19

标签: sql postgresql pandas amazon-redshift

我正在使用Redshift(Postgres)和Pandas来完成我的工作。我正在尝试获取用户操作的数量,让我们说购买以便更容易理解。我有一张桌子,购买的内容包含以下数据:

user_id, timestamp ,  price
1,     , 2015-02-01,  200
1,     , 2015-02-02,  50
1,     , 2015-02-10,  75

最终我想要在特定时间戳内购买的数量。如

userid, 28-14_days, 14-7_days, 7

到目前为止,我知道我的日期没有上限:

SELECT DISTINCT x_days.user_id, SUM(x_days.purchases) AS x_num, SUM(y_days.purchases) AS y_num,
x_days.x_date, y_days.y_date
FROM 
(
SELECT purchases.user_id, COUNT(purchases.user_id) as purchases, 
                                    DATE(purchases.timestamp) as x_date
FROM purchases
WHERE  purchases.timestamp > (current_date - INTERVAL '%(x_days_ago)s day') AND
purchases.max_value > 200
GROUP BY DATE(purchases.timestamp), purchases.user_id
) AS x_days
JOIN
(
    SELECT purchases.user_id, COUNT(purchases.user_id) as purchases, 
                                    DATE(purchases.timestamp) as y_date
    FROM purchases
    WHERE  purchases.timestamp > (current_date - INTERVAL '%(y_days_ago)s    day') AND
    purchases.max_value > 200
    GROUP BY DATE(purchases.timestamp), purchases.user_id) AS y_days 
    ON
    x_days.user_id = y_days.user_id
GROUP BY
x_days.user_id, x_days.x_date, y_days.y_date

params={'x_days_ago':x_days_ago, 'y_days_ago':y_days_ago}
where these are set in python/pandas

x_days_ago = 14 y_days_ago = 7

但这并没有完全按计划进行:

    user_id x_num   y_num   x_date      y_date
0   5451772 1       1       2015-02-10  2015-02-10
1   5026678 1       1       2015-02-09  2015-02-09
2   6337993 2       1       2015-02-14  2015-02-13
3   6204432 1       3       2015-02-10  2015-02-11
4   3417539 1       1       2015-02-11  2015-02-11

即使我没有更高的日期(因此x从14天到现在有效搜索,y是7天到现在,意味着重叠),在某些情况下y更高。

任何人都可以帮我解决这个问题或给我一个更好的方法吗?

谢谢!

1 个答案:

答案 0 :(得分:1)

这可能不是最有效的答案,但您可以使用子选择生成每个总和:

WITH
  summed AS (
    SELECT user_id, day, COUNT(1) AS purchases
      FROM (SELECT user_id, DATE(timestamp) AS day FROM purchases) AS _
     GROUP BY user_id, day
  ),
  users AS (SELECT DISTINCT user_id FROM purchases)
SELECT user_id,
       (SELECT SUM(purchases) FROM summed
         WHERE summed.user_id = users.user_id
           AND day >= DATE(NOW() - interval ' 7 days')) AS days_7,
       (SELECT SUM(purchases) FROM summed
         WHERE summed.user_id = users.user_id
           AND day >= DATE(NOW() - interval '14 days')) AS days_14
  FROM users;

(这是在Postgres中测试的,而不是在Redshift中测试;但Redshift文档表明支持WITHDISTINCT。)我本来希望用窗口来做这个,以获得滚动款项;但如果没有generate_series(),那就太麻烦了。