使用MySQL,我试图弄清楚该问题的答案:用户创建第N个项目的平均间隔时间是多少?
预期结果:
| project count | Average # months |
| 1 | 0 | # On average, it took 0 months to create the first project (nothing to compare to)
| 2 | 12 | # On average, it takes a user 12 months to create their second project
| 3 | 3 | # On average, it takes a user 3 months to create their third project
我的MySQL表代表用户创建的项目。该表可以总结为:
| user_id | project created at |
|---------|--------------------|
| 1 | Jan 1, 2020 1:00 pm|
| 1 | Feb 2, 2020 3:45 am|
| 1 | Nov 6, 2020 0:01 am|
| 1 | Mar 4, 2021 5:01 pm|
|------------------------------|
| 2 | Another timestamp |
| 2 | Another timestamp |
| 2 | Another timestamp |
| 2 | Another timestamp |
| 2 | Another timestamp |
| 2 | Another timestamp |
|------------------------------|
| ... | Another timestamp |
| ... | Another timestamp |
有些用户将只有一个项目,而有些可能会有数百个项目。
with
paid_self_serve_projects_presentation as (
select
`Paid Projects`.owner_email
`Owner Email`,
row_number() over (partition by `Paid Projects`.owner_uuid order by created_at)
`Project Count`,
day(`Paid Projects`.created_at)
`Created Day`,
month(`Paid Projects`.created_at)
`Created Month`,
year(`Paid Projects`.created_at)
`Created Year`,
`Paid Projects`.created_at
`Created`
from self_service_paid_projects as `Paid Projects`
order by `Paid Projects`.owner_uuid, `Paid Projects`.created_at
)
select `Projects`.* from paid_self_serve_projects_presentation as `Projects`
答案 0 :(得分:2)
您可以使用窗口功能。我正在考虑row_number()
枚举按创建日期排序的每个用户的项目,而lag()
来获取创建上一个项目的日期:
select rn, avg(datediff(created_at, lag_created_at)) avg_diff_days
from (
select t.*,
row_number() over(partition by user_id order by created_at) rn,
lag(created_at, 1, created_at) over(partition by user_id order by created_at) lag_created_at
from mytable t
) t
group by rn
这为您提供了以天为单位的平均差异,这在某种程度上可以使该月份的准确性更高。如果您真的想要几个月,请使用timestampdiff(month, lag_created_at, created_at)
而不是datediff()
-但要注意,该函数返回整数值,因此会降低精度。