在SQL中计算日期之间的平均时间

时间:2020-09-30 20:43:11

标签: mysql sql datetime mariadb window-functions

使用MySQL,我试图弄清楚该问题的答案:用户创建第N个项目的平均间隔时间是多少?

预期结果:

| project count | Average # months |
| 1             | 0                | # On average, it took 0 months to create the first project (nothing to compare to)
| 2             | 12               | # On average, it takes a user 12 months to create their second project
| 3             | 3                | # On average, it takes a user 3 months to create their third project

我的MySQL表代表用户创建的项目。该表可以总结为:

| user_id | project created at |
|---------|--------------------|
| 1       | Jan 1, 2020 1:00 pm|
| 1       | Feb 2, 2020 3:45 am|
| 1       | Nov 6, 2020 0:01 am|
| 1       | Mar 4, 2021 5:01 pm|
|------------------------------|
| 2       | Another timestamp  |
| 2       | Another timestamp  |
| 2       | Another timestamp  |
| 2       | Another timestamp  |
| 2       | Another timestamp  |
| 2       | Another timestamp  |
|------------------------------|
| ...     | Another timestamp  |
| ...     | Another timestamp  |

有些用户将只有一个项目,而有些可能会有数百个项目。

编辑:当前实现

with
    paid_self_serve_projects_presentation as (
        select 
                `Paid Projects`.owner_email
            `Owner Email`, 
                row_number() over (partition by `Paid Projects`.owner_uuid order by created_at)
            `Project Count`,
                day(`Paid Projects`.created_at)
            `Created Day`,
                month(`Paid Projects`.created_at)
            `Created Month`,
                year(`Paid Projects`.created_at)
            `Created Year`,
                `Paid Projects`.created_at
            `Created`
        from self_service_paid_projects as `Paid Projects`
        order by `Paid Projects`.owner_uuid, `Paid Projects`.created_at
    )
    
select `Projects`.* from paid_self_serve_projects_presentation as `Projects`

1 个答案:

答案 0 :(得分:2)

您可以使用窗口功能。我正在考虑row_number()枚举按创建日期排序的每个用户的项目,而lag()来获取创建上一个项目的日期:

select rn, avg(datediff(created_at, lag_created_at)) avg_diff_days
from (
    select t.*,
        row_number() over(partition by user_id order by created_at) rn,
        lag(created_at, 1, created_at) over(partition by user_id order by created_at) lag_created_at
    from mytable t
) t
group by rn

这为您提供了以天为单位的平均差异,这在某种程度上可以使该月份的准确性更高。如果您真的想要几个月,请使用timestampdiff(month, lag_created_at, created_at)而不是datediff()-但要注意,该函数返回整数值,因此会降低精度。