以下是我用来计算用户平均会话持续时间的代码。
SELECT
tbl.create_Date
,HourOfDay
,HourOfDay_AMPM
,AVG(TIMESTAMPDIFF(SECOND, tbl.minDt, tbl.maxDt))/60 AS Duration_mins
FROM (SELECT
i.session_id,
i.createDate,
DATE(CONVERT_TZ(i.createDate, '+00:00', '-04:00')) as create_Date,
HOUR(CONVERT_TZ(i.createDate, '+00:00', '-04:00')) as HourOfDay,
DATE_FORMAT(CONVERT_TZ(i.createDate,'+00:00','-04:00'), '%l%p') as HourOfDay_AMPM,
min(i.createDate) minDt,
max(i.createDate) maxDt,
(max(i.createDate) - min(i.createDate) )/60 as Duration
FROM impressions i
WHERE i.createDate >= current_date
AND HOUR(CONVERT_TZ(i.createDate, '+00:00', '-04:00')) >=9
AND HOUR(CONVERT_TZ(i.createDate, '+00:00', '-04:00')) < 22
AND i.session_Id <> ''
GROUP BY i.session_id
HAVING Duration > 0
ORDER BY i.createDate, i.session_id
) as tbl
GROUP BY tbl.create_DATE, tbl.HourOfDay
ORDER by tbl.create_Date
注意,数据库中的时区是UTC,我需要在EST中显示结果,这就是我使用convert_TZ命令的原因。
问题:我运行内部查询并将原始数据粘贴到Excel中,生成数据透视表并获得以下结果
Hour Avg_duration_mins
9AM 14.43
10AM 59.17
11AM 24.55
12PM 12.69
2PM 1.27
然而,按原样运行整个查询会产生以下结果
Hour Avg_duration_mins
9AM 6.98
10AM 18.78
11AM 9.40
12PM 7.49
2PM 1.21
手动检查后,excel结果准确且有意义。为什么SQL会变得疯狂?我觉得问题在于AVG
函数以及max
和min
的聚合。
更新:对于表格展示,可以有多个相同session_id
session_id | createDate | actions |
023awv 2014-10-09 12:02 some action
023awv 2014-10-09 12:12 some action
023awv 2014-10-09 12:22 some action
023awv 2014-10-09 12:32 some action
011awv 2014-10-09 12:42 some action
023awv 2014-10-09 12:42 some action
023awv 2014-10-09 12:52 some action
023awv 2014-10-09 12:53 some action
052brw 2014-10-09 13:02 some action
023awv 2014-10-09 13:05 some action
023awv 2014-10-09 13:06 some action
023awv 2014-10-09 13:08 some action
023awv 2014-10-09 13:12 some action
我希望每小时/每天获得每次会话的平均持续时间。
任何帮助将不胜感激。
答案 0 :(得分:0)
如果您在Excel计算中使用(max(i.createDate) - min(i.createDate) )/60 as Duration
作为分钟数,那么这是错误的。减去日期提供了某种区间表示:
select timestamp('2014-10-09 14:12') - timestamp('2014-10-09 13:04');
> 10800
这是&#34; 1小时8分钟&#34;不是4080秒。
您的内部查询具有分组依据,但也包括非聚合的非分组列。简单来说:
select
session_id,
createDate -- this isn't grouped or aggregated
from
impressions i
group by
session_id
大多数数据库都不允许您这样做。 MySQL将会返回每个createDate
发生的第一个session_id
。因此,您的内部查询会产生不稳定的结果。单独运行它的查询计划可能与用于一起运行查询的查询计划不同。因此,它最终会在每种情况下返回不同的值。
假设展示次数表包含以下两行:
session_id | createDate
--------------------------------
1 | 2014-10-09 13:30:00
1 | 2014-10-09 15:30:00
内部查询应该返回什么?外部查询应该返回什么?
解决问题的一种方法是根据最短日期显示结果:
select
tbl.Create_Date,
tbl.HourOfDay,
tbl.HourOfDay_AMPM,
avg(timestampdiff(second, tbl.minDt, tbl.maxDt))/60 as Duration_mins
from (
select
i.session_id,
date(convert_tz(min(i.createDate), '+00:00', '-04:00')) as create_Date,
hour(convert_tz(min(i.createDate), '+00:00', '-04:00')) as HourOfDay,
date_format(convert_tz(min(i.createDate), '+00:00', '-04:00'), '%l%p') as HourOfDay_AMPM,
min(i.createDate) minDt,
max(i.createDate) maxDt,
(max(i.createDate) - min(i.createDate) )/60 as Duration
from
impressions i
where
i.createDate >= current_date and
hour(convert_tz(i.createDate, '+00:00', '-04:00')) >=9 and
hour(convert_tz(i.createDate, '+00:00', '-04:00')) < 22 and
i.session_Id <> ''
group by
i.session_id
having
Duration > 0
) as tbl
group by
tbl.Create_Date,
tbl.HourOfDay,
tbl.HourOfDay_AMPM
order by
tbl.create_Date,
tbl.HourOfDay
这里我基本上用min(i.CreatDate)替换了内部查询中i.CreateDate的每个非聚合事件。这使得内部查询定义得很好。即只有一个结果集可以返回。
通过阅读MySQL手册,很难弄清楚在这种情况下在内部查询中下订单会做什么。手册说外部订单优先于内部订单。