加入条件和聚合函数

时间:2009-09-21 22:25:41

标签: sql sql-server tsql join

我有一张桌子,里面有关于进入大门的作品的记录。

DECLARE @doorStatistics TABLE
( id INT IDENTITY,
[user] VARCHAR(250),
accessDate DATETIME,
accessType VARCHAR(5)
)

样本记录:

INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('John Wayne','2009-09-01 07:02:43.000','IN')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('Bruce Willis','2009-09-01 07:12:43.000','IN')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('Bruce Willis','2009-09-01 07:22:43.000','OUT')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('John Wayne','2009-09-01 07:32:43.000','OUT')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('John Wayne','2009-09-01 07:37:43.000','IN')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('Bruce Willis','2009-09-01 07:42:43.000','IN')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('John Wayne','2009-09-01 07:48:43.000','OUT')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('Bruce Willis','2009-09-01 07:52:43.000','OUT')

我想要做的是一个查询,它给出了以下结果(基于上面的例子):

| user         | date       | inHour   | outHour  |
|--------------|------------|----------|----------|
| John Wayne   | 2009-09-01 | 07:02:43 | 07:48:43 |
| Bruce Willis | 2009-09-01 | 07:12:43 | 07:22:43 |
| John Wayne   | 2009-09-02 | 07:37:43 | 07:48:43 |
| Bruce Willis | 2009-09-02 | 07:42:43 | 07:52:43 |

我做的查询如下:

SELECT [user], accessDate AS [in date], 
    (SELECT MIN(accessDate) 
        FROM @doorStatistics ds2 
        WHERE accessType = 'OUT' 
            AND ds2.accessDate > ds.accessDate 
            AND ds.[user] = ds2.[user]) AS [out date] 
FROM @doorStatistics ds 
WHERE accessType = 'IN'

但这并不好,因为当用户忘记注册他/她的入口时,它会产生例如这样的事情:

| user         | date       | inHour   | outHour  |
|--------------|------------|----------|----------|
| John Wayne   | 2009-09-02 | 07:02:43 | 07:48:43 |
| John Wayne   | 2009-09-02 | 07:02:43 | 09:26:43 |

虽然应该

| user         | date       | inHour   | outHour  |
|--------------|------------|----------|----------|
| John Wayne   | 2009-09-02 | 07:02:43 | 07:48:43 |
| John Wayne   | 2009-09-02 | NULL     | 09:26:43 |

查询不好的第二个原因是性能。我有超过200 000条记录,每行的SELECT都会减慢查询速度。

可能的解决方案可能是加入两个表

SELECT * FROM @doorStatistics WHERE accessType = 'IN'

SELECT * FROM @doorStatistics WHERE accessType = 'OUT'

但我不知道要获得正确日期的条件。也许有一些MAX或MIN功能可以放在那里,但我不知道。

我不想创建临时表并使用游标。

3 个答案:

答案 0 :(得分:1)

在为具有持续时间的时间事件设计数据库时,最好将“IN”时间和“OUT”时间放在同一行上。

您需要做的所有查询都非常容易。

请参阅第48页和第154页的“Joe Celko's SQL Programming Style”,其中谈到了时间凝聚力

答案 1 :(得分:1)

在结构级别

提高性能

  • 我建议您将accessDate列重命名为accessDateTime
  • 然后根据您的accessDateTime创建一个PERSISTENT计算列(如下所示)。然后,您需要的索引将仅包含accessDate列,您将使用该列与user
  • 进行精确比较
  • 确保您在桌面上有适当的索引(从下面的代码中您可能需要一个关于“user”,“accessDate”并包含“accessType”

accessDate列定义:

accessDate AS CONVERT(SMALLDATETIME, CONVERT(CHAR(8), accessDateTime, 112), 112) PERSISTED

现在,鉴于你已经完成了并且你有SQL-2005 +,这个非常长的查询应该完成这项工作

WITH MatchIN (in_id, out_id)
AS (SELECT      s.id, CASE WHEN COALESCE(y.id, s.id) = s.id THEN x.id ELSE NULL END
    FROM        @doorStatistics s
    LEFT JOIN   @doorStatistics x
            ON  x.id = (SELECT  TOP 1 z.id
                        FROM    @doorStatistics z
                        WHERE   z."user" = s."user"
                            AND z.accessType = 'OUT'
                            AND z.accessDate =  s.accessDate
                            AND z.accessDateTime >= s.accessDateTime
                        ORDER BY z.accessDateTime ASC
                        )
    LEFT JOIN   @doorStatistics y
            ON  y.id = (SELECT  TOP 1 z.id
                        FROM    @doorStatistics z
                        WHERE   z."user" = s."user"
                            AND z.accessType = 'IN'
                            AND z.accessDate =  s.accessDate
                            AND z.accessDateTime >= s.accessDateTime
                            AND z.accessDateTime <= x.accessDateTime
                        ORDER BY z.accessDateTime DESC
                        )
    WHERE       s.accessType = 'IN'
)
,    MatchOUT (out_id, in_id)
AS (SELECT      s.id, CASE WHEN COALESCE(y.id, s.id) = s.id THEN x.id ELSE NULL END
    FROM        @doorStatistics s
    LEFT JOIN   @doorStatistics x
            ON  x.id = (SELECT  TOP 1 z.id
                        FROM    @doorStatistics z
                        WHERE   z."user" = s."user"
                            AND z.accessType = 'IN'
                            AND z.accessDate =  s.accessDate
                            AND z.accessDateTime <= s.accessDateTime
                        ORDER BY z.accessDateTime DESC
                        )
    LEFT JOIN   @doorStatistics y
            ON  y.id = (SELECT  TOP 1 z.id
                        FROM    @doorStatistics z
                        WHERE   z."user" = s."user"
                            AND z.accessType = 'OUT'
                            AND z.accessDate =  s.accessDate
                            AND z.accessDateTime <= s.accessDateTime
                            AND z.accessDateTime >= x.accessDateTime
                        ORDER BY z.accessDateTime ASC
                        )
    WHERE       s.accessType = 'OUT'
)

SELECT  COALESCE(i."user", o."user") AS "user",
        COALESCE(i.accessDate, o.accessDate) AS "date",
        CONVERT(CHAR(10), i.accessDateTime, 108) AS "inHour",
        CONVERT(CHAR(10), o.accessDateTime, 108) AS "outHour"
FROM   (SELECT in_id, out_id FROM MatchIN
        UNION -- this will eliminate duplicates as the same time
        SELECT in_id, out_id FROM MatchOUT
        ) x
LEFT JOIN   @doorStatistics i
        ON  i.id = x.in_id
LEFT JOIN   @doorStatistics o
        ON  o.id = x.out_id
ORDER BY    "user", "date", "inHour"

要测试缺失行的处理,只需注释掉一些测试数据的INSERT语句。

答案 2 :(得分:1)

在确保没有介入的IN记录(这对应于某人在没有离开建筑物的情况下获得IN两次)时,您需要为给定用户的每个IN记录选择最小OUT记录。这需要一些适度棘手的SQL(例如,一个NOT EXISTS子句)。因此,您将在表上进行自联接,并在同一个表上添加NOT EXISTS子查询。只要确保你明确地对表的所有引用进行别名。