我有一个列id | client_id | datetime
的交易表,我计算了交易之间的平均天数,以了解每个客户进行此交易的频率:
SELECT *, ((date_last_transaction - date_first_transaction)/total_transactions) AS frequency
FROM (
SELECT client_id, COUNT(id) AS total_transactions, MIN(datetime) AS date_first_transaction, MAX(datetime) AS date_last_transaction
FROM transactions
GROUP BY client_id
) AS t;
使用postgresql计算一组日期的标准差(以天为单位)的现有方法是什么?最好只有一个查询,如果它是可信的: - )
答案 0 :(得分:1)
我找到了这样的方式:
SELECT extract(day from date_trunc('day', (
CASE WHEN COUNT(*) <= 1 THEN
0
ELSE
SUM(time_since_last_invoice)/(COUNT(*)-1)
END
) * '1 day'::interval)) AS days_between_purchases,
extract(day from date_trunc('day', (
CASE WHEN COUNT(*) <= 2 THEN
0
ELSE
STDDEV(time_since_last_invoice)
END
) * '1 day'::interval)) AS range_of_days
FROM (
SELECT client_id, datetime, COALESCE(datetime - lag(datetime)
OVER (PARTITION BY client_id ORDER BY client_id, datetime
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING
), 0
) AS time_since_last_invoice
FROM my_table
GROUP BY client_id, datetime
ORDER BY client_id, datetime
)
<强>解释强>:
此查询按客户端和日期分组,然后按datetime
计算每对交易日期(client_id
)之间的差异,并返回包含这些结果的表格。在此之后,外部查询处理该表并计算差异大于0之间的平均时间(排除每个组中的第一个值,因为这是第一个事务,因此间隔为0)。
当同一客户存在2个以上的交易日期时,计算标准差,以避免除以零错误。
所有差异都以PostgreSQL间隔格式返回。