查询细分：

Question

假设一个表如：

UID     Name        Datetime                Users
4       Room 4      2012-08-03 14:00:00     3
2       Room 2      2012-08-03 14:00:00     3
3       Room 3      2012-08-03 14:00:00     1
1       Room 1      2012-08-03 14:00:00     2

3       Room 3      2012-08-03 14:15:00     1
2       Room 2      2012-08-03 14:15:00     4
1       Room 1      2012-08-03 14:15:00     3

1       Room 1      2012-08-03 14:30:00     6

1       Room 1      2012-08-03 14:45:00     3
2       Room 2      2012-08-03 14:45:00     7
3       Room 3      2012-08-03 14:45:00     8
4       Room 4      2012-08-03 14:45:00     4

我希望从下午2点到下午3点获得每个房间（1,2,3,4）的平均用户数。问题是，有时房间可能不会在15分钟的间隔时间“登记”，因此必须假设前一个已知的用户计数仍然有效。

例如，2012-08-03 14:15:00会议室4的办理登机手续从未办理登机手续，因此必须假设会议室4在2012-08-03 14:15:00有3位用户，因为这是2012-08-03 14:00:00所拥有的

接下来，我正在寻找的平均用户数如下：

房间1：（2 + 3 + 6 + 3）/ 4 = 3.5
2号房间：（3 + 4 + 4 + 7）/ 4 = 4.5
3号房间：（1 + 1 + 1 + 8）/ 4 = 2.75
4号房间：（3 + 3 + 3 + 4）/ 4 = 3.25

其中#是基于先前已知登记的假定数字。

我想知道单独使用SQL是否可以这么做？如果不是，我很好奇一个巧妙的PHP解决方案，不仅仅是暴力数学，例如我的快速不准确的伪代码：

foreach ($rooms_id_array as $room_id) {
    $SQL = "SELECT * FROM `table` WHERE (`UID` == $room_id && `Datetime` >= 2012-08-03 14:00:00 && `Datetime` <= 2012-08-03 15:00:00)";
    $result = query($SQL);
    if ( count($result) < 4 ) {
        // go through each date and find what is missing, and then go to previous date and use that instead
    } else {
        foreach ($result)
            $sum += $result;
        $avg = $sum / 4;
    }

}

Answer 1

你的困难（最昂贵的一步）将填补空白。如果无法在源数据中“填空”，则可能需要使用模板加入，然后使用相关子查询来查找与该模板关联的数据。

这对于真实的表格来说通常是最好的，但这里有一个硬编码的内嵌视图的例子......

SELECT
  `room`.`uid`           `uid` ,
  AVG(`data`.`users`)    `average_users`
FROM
  (SELECT 1 `UID`  UNION ALL
   SELECT 2 `UID`  UNION ALL
   SELECT 3 `UID`  UNION ALL
   SELECT 4 `UID`)                                     `room`
CROSS JOIN
  (SELECT '2012-08-03 14:00:00' `datetime`  UNION ALL
   SELECT '2012-08-03 14:15:00' `datetime`  UNION ALL
   SELECT '2012-08-03 14:30:00' `datetime`  UNION ALL
   SELECT '2012-08-03 14:45:00' `datetime`)            `checkin`
LEFT JOIN
  data
    ON  `data`.`uid`      = `room`.`uid`
    AND `data`.`datetime` = (SELECT MAX(`datetime`)
                               FROM `data`
                              WHERE `uid`       = `room`.`uid`
                                AND `datetime` <= `checkin`.`datetime`)
GROUP BY
  `room`.`uid`

- CROSS JOIN创建模板以确保每个房间的每个签到位置都有一条记录。

- correlated sub-query搜索时间，以查找当时该房间的最新签到。

Answer 2

您可以使用此解决方案：

SELECT   b.Name, 
         AVG(b.Users) avg_users
FROM     (
         SELECT     a.UID, 
                    MAX(c.Datetime) last_date
         FROM       (SELECT DISTINCT UID FROM tbl) a
         CROSS JOIN (
                    SELECT '14:00:00' intrvl UNION ALL
                    SELECT '14:15:00'        UNION ALL
                    SELECT '14:30:00'        UNION ALL
                    SELECT '14:45:00'
                    ) b
         JOIN       tbl c ON a.UID           = c.UID
                         AND TIME(b.intrvl) >= TIME(c.Datetime)
         GROUP BY   a.UID,
                    b.intrvl
         ) a
JOIN     tbl b ON a.UID       = b.UID
              AND a.last_date = b.Datetime
GROUP BY b.UID,
         b.Name

查询细分：

第1步：

我们需要做的第一件事是将每个房间与每个时间间隔相关联。例如，在您的示例数据中，Room 4与时间间隔14:15:00和14:30:00没有关联，但我们仍需要以某种方式表示这些关联。

我们通过创建具有相关时间间隔的每个不同房间的Cartesian product来实现此目的：

SELECT     a.UID, 
           b.intrvl
FROM       (SELECT DISTINCT UID FROM tbl) a
CROSS JOIN (
           SELECT '14:00:00' intrvl UNION ALL
           SELECT '14:15:00'        UNION ALL
           SELECT '14:30:00'        UNION ALL
           SELECT '14:45:00'
           ) b
ORDER BY   b.intrvl, a.UID DESC --Ordering for display purposes

渲染：

UID | intrvl
--------------
4   | 14:00:00
3   | 14:00:00
2   | 14:00:00
1   | 14:00:00
4   | 14:15:00
3   | 14:15:00
2   | 14:15:00
1   | 14:15:00
4   | 14:30:00
3   | 14:30:00
2   | 14:30:00
1   | 14:30:00
4   | 14:45:00
3   | 14:45:00
2   | 14:45:00
1   | 14:45:00

SQLFiddle Demo

第2步：

然后，一旦我们有了这些关联，我们将结果重新加入主表（tbl），条件是主表的Datetime字段的时间部分小于笛卡尔加入每个UID的时间。这将是每个UID - ＆gt; intrvl关联，它会显示intrvl时间之前或之前发生的所有条目。

例如，由于Room 3没有14:30:00 intrvl的条目，因此只有两个条目会加入该条款：14:15:00和{{1}上的条目因为它们都发生在内向时间之前或之前。

您现在可以看到我们的目标。此步骤的结果将使我们能够访问每个内容的最新条目。

14:00:00

呈现（SELECT a.UID, b.intrvl, c.* FROM (SELECT DISTINCT UID FROM tbl) a CROSS JOIN ( SELECT '14:00:00' intrvl UNION ALL SELECT '14:15:00' UNION ALL SELECT '14:30:00' UNION ALL SELECT '14:45:00' ) b JOIN tbl c ON a.UID = c.UID AND TIME(b.intrvl) >= TIME(c.Datetime) ORDER BY b.intrvl, a.UID DESC, c.Datetime --Ordering for display purposes列除外）：

Name

SQLFiddle Demo

第3步：

我们的下一步是采用上面的结果集，并为每个内容仅提取最近加入的UID | intrvl | Datetime | Users ---------------- -------------------------------- 4 | 14:00:00 | 2012-08-03 14:00:00 | 3 <-- Most recent entry up until 14:00:00 3 | 14:00:00 | 2012-08-03 14:00:00 | 1 <-- Most recent entry up until 14:00:00 2 | 14:00:00 | 2012-08-03 14:00:00 | 3 <-- Most recent entry up until 14:00:00 1 | 14:00:00 | 2012-08-03 14:00:00 | 2 <-- Most recent entry up until 14:00:00 4 | 14:15:00 | 2012-08-03 14:00:00 | 3 <-- Most recent entry up until 14:15:00 3 | 14:15:00 | 2012-08-03 14:00:00 | 1 3 | 14:15:00 | 2012-08-03 14:15:00 | 1 <-- Most recent entry up until 14:15:00 2 | 14:15:00 | 2012-08-03 14:00:00 | 3 2 | 14:15:00 | 2012-08-03 14:15:00 | 4 <-- Most recent entry up until 14:15:00 1 | 14:15:00 | 2012-08-03 14:00:00 | 2 1 | 14:15:00 | 2012-08-03 14:15:00 | 3 <-- Most recent entry up until 14:15:00 4 | 14:30:00 | 2012-08-03 14:00:00 | 3 <-- Most recent entry up until 14:30:00 3 | 14:30:00 | 2012-08-03 14:00:00 | 1 3 | 14:30:00 | 2012-08-03 14:15:00 | 1 <-- Most recent entry up until 14:30:00 2 | 14:30:00 | 2012-08-03 14:00:00 | 3 2 | 14:30:00 | 2012-08-03 14:15:00 | 4 <-- Most recent entry up until 14:30:00 1 | 14:30:00 | 2012-08-03 14:00:00 | 2 1 | 14:30:00 | 2012-08-03 14:15:00 | 3 1 | 14:30:00 | 2012-08-03 14:30:00 | 6 <-- Most recent entry up until 14:30:00 4 | 14:45:00 | 2012-08-03 14:00:00 | 3 4 | 14:45:00 | 2012-08-03 14:45:00 | 4 <-- Most recent entry up until 14:45:00 3 | 14:45:00 | 2012-08-03 14:00:00 | 1 3 | 14:45:00 | 2012-08-03 14:15:00 | 1 3 | 14:45:00 | 2012-08-03 14:45:00 | 8 <-- Most recent entry up until 14:45:00 2 | 14:45:00 | 2012-08-03 14:00:00 | 3 2 | 14:45:00 | 2012-08-03 14:15:00 | 4 2 | 14:45:00 | 2012-08-03 14:45:00 | 7 <-- Most recent entry up until 14:45:00 1 | 14:45:00 | 2012-08-03 14:00:00 | 2 1 | 14:45:00 | 2012-08-03 14:15:00 | 3 1 | 14:45:00 | 2012-08-03 14:30:00 | 6 1 | 14:45:00 | 2012-08-03 14:45:00 | 3 <-- Most recent entry up until 14:45:00。我们可以将Datetime与GROUP BY聚合函数结合使用来实现此目的。

很遗憾，由于MAX()行为的原因，我们无法正确提取Users的值以及每个选定的Datetime。

GROUP BY

渲染：

SELECT     a.UID, 
           b.intrvl,
           MAX(c.Datetime) last_date
FROM       (SELECT DISTINCT UID FROM tbl) a
CROSS JOIN (
           SELECT '14:00:00' intrvl UNION ALL
           SELECT '14:15:00'        UNION ALL
           SELECT '14:30:00'        UNION ALL
           SELECT '14:45:00'
           ) b
JOIN       tbl c ON a.UID           = c.UID
                AND TIME(b.intrvl) >= TIME(c.Datetime)
GROUP BY   a.UID,
           b.intrvl
ORDER BY   b.intrvl, a.UID DESC --Again, for display purposes

SQLFiddle Demo

第4步

现在我们必须为每个UID | intrvl | last_date --------------------------------------- 4 | 14:00:00 | 2012-08-03 14:00:00 3 | 14:00:00 | 2012-08-03 14:00:00 2 | 14:00:00 | 2012-08-03 14:00:00 1 | 14:00:00 | 2012-08-03 14:00:00 4 | 14:15:00 | 2012-08-03 14:00:00 3 | 14:15:00 | 2012-08-03 14:15:00 2 | 14:15:00 | 2012-08-03 14:15:00 1 | 14:15:00 | 2012-08-03 14:15:00 4 | 14:30:00 | 2012-08-03 14:00:00 3 | 14:30:00 | 2012-08-03 14:15:00 2 | 14:30:00 | 2012-08-03 14:15:00 1 | 14:30:00 | 2012-08-03 14:30:00 4 | 14:45:00 | 2012-08-03 14:45:00 3 | 14:45:00 | 2012-08-03 14:45:00 2 | 14:45:00 | 2012-08-03 14:45:00 1 | 14:45:00 | 2012-08-03 14:45:00获取Users的值，以便我们可以获取这些值的平均值。我们通过在最后一步中将查询包装为last_date子句中的子选择并再次连接到主表上，条件是每个匹配FROM - ＆gt; UID关联，抓取last_date的值。

Users

渲染：

SELECT   a.UID,
         a.last_date,
         b.Users
FROM     (
         SELECT     a.UID, 
                    MAX(c.Datetime) last_date
         FROM       (SELECT DISTINCT UID FROM tbl) a
         CROSS JOIN (
                    SELECT '14:00:00' intrvl UNION ALL
                    SELECT '14:15:00'        UNION ALL
                    SELECT '14:30:00'        UNION ALL
                    SELECT '14:45:00'
                    ) b
         JOIN       tbl c ON a.UID           = c.UID
                         AND TIME(b.intrvl) >= TIME(c.Datetime)
         GROUP BY   a.UID,
                    b.intrvl
         ) a
JOIN     tbl b ON a.UID       = b.UID
              AND a.last_date = b.Datetime
ORDER BY a.UID DESC --Display purposes again

SQLFiddle Demo

第5步

现在只需要对每个房间进行分组并平均UID | last_date | Users --------------------------------- 4 | 2012-08-03 14:00:00 | 3 4 | 2012-08-03 14:00:00 | 3 4 | 2012-08-03 14:00:00 | 3 4 | 2012-08-03 14:45:00 | 4 3 | 2012-08-03 14:00:00 | 1 3 | 2012-08-03 14:15:00 | 1 3 | 2012-08-03 14:15:00 | 1 3 | 2012-08-03 14:45:00 | 8 2 | 2012-08-03 14:00:00 | 3 2 | 2012-08-03 14:15:00 | 4 2 | 2012-08-03 14:15:00 | 4 2 | 2012-08-03 14:45:00 | 7 1 | 2012-08-03 14:00:00 | 2 1 | 2012-08-03 14:15:00 | 3 1 | 2012-08-03 14:30:00 | 6 1 | 2012-08-03 14:45:00 | 3列：

Users

渲染：

SELECT   b.Name, 
         AVG(b.Users) avg_users
FROM     (
         SELECT     a.UID, 
                    MAX(c.Datetime) last_date
         FROM       (SELECT DISTINCT UID FROM tbl) a
         CROSS JOIN (
                    SELECT '14:00:00' intrvl UNION ALL
                    SELECT '14:15:00'        UNION ALL
                    SELECT '14:30:00'        UNION ALL
                    SELECT '14:45:00'
                    ) b
         JOIN       tbl c ON a.UID           = c.UID
                         AND TIME(b.intrvl) >= TIME(c.Datetime)
         GROUP BY   a.UID,
                    b.intrvl
         ) a
JOIN     tbl b ON a.UID       = b.UID
              AND a.last_date = b.Datetime
GROUP BY b.UID,
         b.Name

SQLFiddle Demo of Final Result

Answer 3

我只是在MySQL variables玩了一下，并提出了以下想法：

只需计算用户随时间的（离散）积分，然后除以总时间。

SET @avgSum := @lastValue := @lastTime := @firstTime := 0;
SELECT
  *,
  @firstTime := IF(@firstTime = 0, UNIX_TIMESTAMP(`DateTime`), @firstTime),
  @avgSum := @avgSum + (UNIX_TIMESTAMP(`DateTime`) - @lastTime) * @lastValue,
  @lastValue,
  @lastTime,
  @lastValue := `Users`,
  @lastTime := UNIX_TIMESTAMP(`DateTime`),
  @avgSum / (UNIX_TIMESTAMP(`DateTime`) - @firstTime) AS `average`
FROM
  `table`
WHERE
  `UID` = 1 AND
  UNIX_TIMESTAMP(`DateTime`) >= … AND
  UNIX_TIMESTAMP(`DateTime`) < …
ORDER BY
  UNIX_TIMESTAMP(`DateTime`) ASC;

@firstTime是第一个用户记录的时间戳，@avgSum用户总和随时间的变化（积分）。 @lastValue和@lastTime是前一记录的值和时间。列average是用户除以整个时间间隔的总和（不介意NULL，因为第一条记录除以零）。

仍然存在两个限制：必须存在给定间隔的第一个和最后一个记录。没有，平均“结束”在最后一个可用记录。

Answer 4

我认为即使登记间隔不均匀，这也能很好地适应所有时间范围。另外，我认为你的例子中有错误;在您的加权平均值中，房间2的最后一个值为“4”而不是“7”。

设置：

if object_id(N'avgTbl', N'U') is not null
drop table avgTbl;

create table avgTbl (
    UserId int not null,
    RoomName nvarchar(10) not null,
    CheckInTime datetime not null,
    UserCount int not null,

    constraint pk_avgTbl primary key (UserId, RoomName, CheckInTime)
);

insert into avgTbl (UserId, RoomName, CheckInTime, UserCount) values
(4, 'Room 4', '2012-08-03 14:00:00', 3),
(2, 'Room 2', '2012-08-03 14:00:00', 3),
(3, 'Room 3', '2012-08-03 14:00:00', 1),
(1, 'Room 1', '2012-08-03 14:00:00', 2),

(3, 'Room 3', '2012-08-03 14:15:00', 1),
(2, 'Room 2', '2012-08-03 14:15:00', 4),
(1, 'Room 1', '2012-08-03 14:15:00', 3),

(1, 'Room 1', '2012-08-03 14:30:00', 6),

(1, 'Room 1', '2012-08-03 14:45:00', 3),
(2, 'Room 2', '2012-08-03 14:45:00', 7),
(3, 'Room 3', '2012-08-03 14:45:00', 8),
(4, 'Room 4', '2012-08-03 14:45:00', 4);

查询：

/* 
* You just need to enter the start and end times below.  
* They can be any intervals, as long as the start time is 
* before the end time.
*/
declare 
    @startTime datetime = '2012-08-03 14:00:00',
    @endTime datetime = '2012-08-03 15:00:00';

declare     
    @totalTime numeric(18,1) = datediff(MINUTE, @startTime, @endTime);

    /*
    * This orders the observations, and assigns a sequential number so we can 
    *join on it later.
    */
with diffs as (
    select 
        row_number() over (order by RoomName, CheckInTime) as RowNum,
        CheckInTime,
        UserCount,
        RoomName
    from avgTbl
),
/*
* Get the time periods, 
* calc the number of minutes, 
* divide by the total minutes in the period, 
* multiply by the UserCount to get the weighted value, 
* sum the weighted values to get the weighted avg.
*/
mins as (
    select 
        cur.RoomName,
        /*
        * If we do not have an observation for a given room, use "0" instead
        * of "null", so it does not affect calculations later.
        */
        case 
            when prv.UserCount is null then 0
            else prv.UserCount
            end as UserCount, 
        /* The current observation time. */            
        cur.CheckInTime as CurrentT,
        /* The prior observation time. */
        prv.CheckInTime as PrevT,
        /*
        * The difference in minutes between the current, and previous qbservation
        * times.  If it is the first observation, then use the @startTime as the
        * previous observation time.  If the current time is null, then use the
        * end time.
        */
        datediff(MINUTE, 
            case 
                when prv.CheckInTime is null then @startTime 
                else prv.CheckInTime 
                end, 
            case 
                when cur.CheckInTime is null then @endTime 
                else cur.CheckInTime 
                end) as Mins 
    from diffs as cur
        /*
        * Join the observations based on the row numbers.  This gets the current,
        * and previous observations together in the same record, so we can 
        * perform our calculations.
        */
        left outer join diffs as prv on cur.RowNum = prv.RowNum + 1
            and cur.RoomName = prv.RoomName
    union
    /*
    * Add the end date as a period end, assume that the user count is the same 
    * as the last observation.
    */
    select 
        d.RoomName, 
        d.UserCount, 
        @endTime,
        d.CheckInTime, -- The last recorded observation time.
        datediff(MINUTE, d.CheckInTime, @endTime) as Mins
    from diffs as d 
    where d.RowNum in (
        select MAX(d2.RowNum)
        from diffs as d2
        where d2.RoomName = d.RoomName
        )
    group by d.RoomName, d.CheckInTime, d.UserCount
)
/* Now we just need to get our weighted average calculations. */
select 
    m.RoomName, 
    count(1) - 1 as NumOfObservations,
    /*
    * m.Min = minutes during which "UserCount" is the active number.
    * @totalTime = total minutes between start and end.
    * m.Min / @totalTime = the % of the total time.
    * (m.Min / @totalTime) * UserCount = The weighted value.
    * sum(..above..) = The total weighted average across the observations.
    */
    sum((m.Mins/@totalTime) * m.UserCount) as WgtAvg
from mins as m
group by m.RoomName
order by m.RoomName;

在缺少数据的时间范围内平均

4 个答案:

查询细分：

第1步：

第2步：

第3步：

第4步

第5步