一组日期的标准差

时间:2016-12-08 16:40:49

标签: sql postgresql statistics

我有一个列id | client_id | datetime的交易表,我计算了交易之间的平均天数,以了解每个客户进行此交易的频率:

SELECT *, ((date_last_transaction - date_first_transaction)/total_transactions) AS frequency 
FROM (
    SELECT client_id, COUNT(id) AS total_transactions, MIN(datetime) AS date_first_transaction, MAX(datetime) AS date_last_transaction
    FROM transactions
    GROUP BY client_id
) AS t;

使用postgresql计算一组日期的标准差(以天为单位)的现有方法是什么?最好只有一个查询,如果它是可信的: - )

1 个答案:

答案 0 :(得分:1)

我找到了这样的方式:

SELECT extract(day from date_trunc('day', (
        CASE WHEN COUNT(*) <= 1 THEN 
            0 
        ELSE 
            SUM(time_since_last_invoice)/(COUNT(*)-1) 
        END
    ) * '1 day'::interval)) AS days_between_purchases, 
    extract(day from date_trunc('day', (
        CASE WHEN COUNT(*) <= 2 THEN 
            0 
        ELSE 
            STDDEV(time_since_last_invoice) 
        END
    ) * '1 day'::interval)) AS range_of_days
FROM (
    SELECT client_id, datetime, COALESCE(datetime - lag(datetime) 
              OVER (PARTITION BY client_id ORDER BY client_id, datetime
                 ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING
              ), 0
           ) AS time_since_last_invoice
    FROM my_table 
    GROUP BY client_id, datetime
    ORDER BY client_id, datetime
)

<强>解释: 此查询按客户端和日期分组,然后按datetime计算每对交易日期(client_id)之间的差异,并返回包含这些结果的表格。在此之后,外部查询处理该表并计算差异大于0之间的平均时间(排除每个组中的第一个值,因为这是第一个事务,因此间隔为0)。 当同一客户存在2个以上的交易日期时,计算标准差,以避免除以零错误。 所有差异都以PostgreSQL间隔格式返回。