AVG(TIMESTAMPDIFF)mySQL返回错误的答案

时间:2014-10-09 19:25:16

标签: mysql sql excel timestamp average

以下是我用来计算用户平均会话持续时间的代码。

SELECT 
    tbl.create_Date
   ,HourOfDay
   ,HourOfDay_AMPM

   ,AVG(TIMESTAMPDIFF(SECOND, tbl.minDt, tbl.maxDt))/60 AS Duration_mins

  FROM (SELECT 
           i.session_id,
           i.createDate,
           DATE(CONVERT_TZ(i.createDate, '+00:00', '-04:00')) as create_Date,
           HOUR(CONVERT_TZ(i.createDate, '+00:00', '-04:00')) as HourOfDay,
           DATE_FORMAT(CONVERT_TZ(i.createDate,'+00:00','-04:00'), '%l%p') as HourOfDay_AMPM,
           min(i.createDate) minDt,
           max(i.createDate) maxDt,
           (max(i.createDate) - min(i.createDate) )/60 as Duration
      FROM impressions i 

     WHERE i.createDate >= current_date
     AND HOUR(CONVERT_TZ(i.createDate, '+00:00', '-04:00')) >=9
     AND HOUR(CONVERT_TZ(i.createDate, '+00:00', '-04:00')) < 22
     AND i.session_Id <> ''


     GROUP BY i.session_id
     HAVING Duration > 0
     ORDER BY i.createDate, i.session_id

        ) as tbl
 GROUP BY  tbl.create_DATE, tbl.HourOfDay

 ORDER by tbl.create_Date

注意,数据库中的时区是UTC,我需要在EST中显示结果,这就是我使用convert_TZ命令的原因。

问题:我运行内部查询并将原始数据粘贴到Excel中,生成数据透视表并获得以下结果

Hour    Avg_duration_mins
9AM     14.43
10AM    59.17
11AM    24.55
12PM    12.69
2PM     1.27

然而,按原样运行整个查询会产生以下结果

 Hour    Avg_duration_mins
 9AM    6.98
10AM    18.78
11AM    9.40
12PM    7.49
 2PM    1.21

手动检查后,excel结果准确且有意义。为什么SQL会变得疯狂?我觉得问题在于AVG函数以及maxmin的聚合。

更新:对于表格展示,可以有多个相同session_id

的条目
session_id     |   createDate     |    actions     |
   023awv        2014-10-09 12:02     some action
   023awv        2014-10-09 12:12     some action
   023awv        2014-10-09 12:22     some action
   023awv        2014-10-09 12:32     some action
   011awv        2014-10-09 12:42     some action
   023awv        2014-10-09 12:42     some action
   023awv        2014-10-09 12:52     some action
   023awv        2014-10-09 12:53     some action
   052brw        2014-10-09 13:02     some action
   023awv        2014-10-09 13:05     some action
   023awv        2014-10-09 13:06     some action
   023awv        2014-10-09 13:08     some action
   023awv        2014-10-09 13:12     some action

我希望每小时/每天获得每次会话的平均持续时间。

任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:0)

如果您在Excel计算中使用(max(i.createDate) - min(i.createDate) )/60 as Duration作为分钟数,那么这是错误的。减去日期提供了某种区间表示:

select timestamp('2014-10-09 14:12') - timestamp('2014-10-09 13:04');

> 10800

这是&#34; 1小时8分钟&#34;不是4080秒。

您的内部查询具有分组依据,但也包括非聚合的非分组列。简单来说:

select
    session_id,
    createDate -- this isn't grouped or aggregated
from
    impressions i
group by
    session_id

大多数数据库都不允许您这样做。 MySQL将会返回每个createDate发生的第一个session_id。因此,您的内部查询会产生不稳定的结果。单独运行它的查询计划可能与用于一起运行查询的查询计划不同。因此,它最终会在每种情况下返回不同的值。

假设展示次数表包含以下两行:

session_id | createDate
--------------------------------
         1 | 2014-10-09 13:30:00
         1 | 2014-10-09 15:30:00

内部查询应该返回什么?外部查询应该返回什么?

解决问题的一种方法是根据最短日期显示结果:

select
    tbl.Create_Date,
    tbl.HourOfDay,
    tbl.HourOfDay_AMPM,
    avg(timestampdiff(second, tbl.minDt, tbl.maxDt))/60 as Duration_mins
from (
    select
       i.session_id,
       date(convert_tz(min(i.createDate), '+00:00', '-04:00')) as create_Date,
       hour(convert_tz(min(i.createDate), '+00:00', '-04:00')) as HourOfDay,
       date_format(convert_tz(min(i.createDate), '+00:00', '-04:00'), '%l%p') as HourOfDay_AMPM,
       min(i.createDate) minDt,
       max(i.createDate) maxDt,
       (max(i.createDate) - min(i.createDate) )/60 as Duration
    from
        impressions i 
    where
        i.createDate >= current_date and
        hour(convert_tz(i.createDate, '+00:00', '-04:00')) >=9 and
        hour(convert_tz(i.createDate, '+00:00', '-04:00')) < 22 and
        i.session_Id <> ''
    group by
        i.session_id
    having
        Duration > 0
    ) as tbl
group by
    tbl.Create_Date,
    tbl.HourOfDay,
    tbl.HourOfDay_AMPM
order by
    tbl.create_Date,
    tbl.HourOfDay

这里我基本上用min(i.CreatDate)替换了内部查询中i.CreateDate的每个非聚合事件。这使得内部查询定义得很好。即只有一个结果集可以返回。

通过阅读MySQL手册,很难弄清楚在这种情况下在内部查询中下订单会做什么。手册说外部订单优先于内部订单。