如何使用SQL查询基于顺序关系选择记录

时间:2013-07-06 11:08:21

标签: mysql sql

我正在尝试从一组客户端和服务器之间的连接中查询某些大数据中的某些信息。以下是表格中相关列的示例数据(connection_stats):

+---------------------------------------------------------+
|   timestamp         | client_id | server_id |  status   | 
+---------------------------------------------------------+
| 2013-07-06 10:40:30 |   100     |   800     |  SUCCESS  |
+---------------------------------------------------------+
| 2013-07-06 10:40:50 |   101     |   801     |  FAILED   |
+---------------------------------------------------------+
| 2013-07-06 10:42:00 |   100     |   800     |  ABORTED  |
+---------------------------------------------------------+
| 2013-07-06 10:43:30 |   100     |   801     |  SUCCESS  |
+---------------------------------------------------------+
| 2013-07-06 10:56:00 |   100     |   800     |  FAILED   |
+---------------------------------------------------------+

在此表中,我尝试按连接状态“FAILED”查询连接状态“ABORTED”的所有实例(按时间戳顺序),每个client_id,server_id对 。我想获得两条记录 - 状态为“已淘汰”且状态为“失败”的记录。在上面的数据样本中有一个这样的情况 - 对于100,800对,在“ABORTED”之后立即出现“FAILED”状态。

我是SQL和数据库的新手,我完全迷失在这一点上。任何有关如何处理此问题的建议都将非常感激。

数据库是mysql。

7 个答案:

答案 0 :(得分:2)

不可否认,这不是很优雅,但是我可以直接使用没有CTE或排名功能的MySQL工作,而且没有保证唯一的行ID可以使用。

SELECT aborted.* FROM Table1 aborted JOIN Table1 failed
  ON aborted.server_id = failed.server_id 
 AND aborted.client_id = failed.client_id
 AND aborted.timestamp < failed.timestamp
LEFT JOIN Table1 filler
  ON filler.server_id = aborted.server_id
 AND filler.client_id = aborted.client_id
 AND aborted.timestamp < filler.timestamp
 AND filler.timestamp < failed.timestamp
WHERE filler.timestamp IS NULL
  AND aborted.status = 'ABORTED' AND failed.status = 'FAILED'
UNION
SELECT failed.* FROM Table1 aborted JOIN Table1 failed
  ON aborted.server_id = failed.server_id
 AND aborted.client_id = failed.client_id
 AND aborted.timestamp < failed.timestamp
LEFT JOIN Table1 filler
  ON filler.server_id = aborted.server_id
 AND filler.client_id = aborted.client_id
 AND aborted.timestamp < filler.timestamp
 AND filler.timestamp < failed.timestamp
WHERE filler.timestamp IS NULL
  AND aborted.status = 'ABORTED' AND failed.status = 'FAILED'

An SQLfiddle to test with

如果您对只有一行记录了两个记录感到满意,您只需从中止/失败中选择您想要的字段并跳过整个联合的后半部分(即查询将被减半)

由于我在UNION上收到了评论,所以使用JOIN也是一样的,假设每个客户端/服务器组合的时间戳是唯一的(这里唯一的行ID会有帮助);

SELECT * FROM Table1 t JOIN
(
 SELECT 
   aborted.server_id asid, aborted.client_id acid, aborted.timestamp ats,
    failed.server_id fsid,  failed.client_id fcid,  failed.timestamp fts
 FROM Table1 aborted JOIN Table1 failed
   ON aborted.server_id = failed.server_id
  AND aborted.client_id = failed.client_id
  AND aborted.timestamp < failed.timestamp
 LEFT JOIN Table1 filler
   ON filler.server_id = aborted.server_id
  AND filler.client_id = aborted.client_id
  AND aborted.timestamp < filler.timestamp
  AND filler.timestamp < failed.timestamp
 WHERE filler.timestamp IS NULL
   AND aborted.status = 'ABORTED' AND failed.status = 'FAILED'
) u
WHERE t.server_id=asid AND t.client_id=acid AND t.timestamp=ats
   OR t.server_id=fsid AND t.client_id=fcid AND t.timestamp=fts
ORDER BY timestamp

An SQLfiddle to test with

答案 1 :(得分:1)

我正在回答这个问题(尽管很晚),因为我想提供更一般的方法。 MySQL没有lag()lead()函数,但您可以使用子查询来实现它。我们的想法是查找client_id / server_id对的下一个时间戳,然后联接回原始数据以获取完整记录。这允许您从“下一个”记录中提取任意数量的记录。它还允许您考虑更复杂的关系(例如,“失败”必须在3分钟内):

select cs.*, csnext.timestamp as nextTimeStamp, csnext.status as nextStatus
from (select cs.*,
             (select timestamp
              from connection_stats cs2
              where cs2.client_id = cs.client_id and
                    cs2.server_id = cs.server_id and
                    cs2.timestamp > cs.timestamp
              order by cs2.timestamp
              limit 1
             ) as Nexttimestamp
      from connection_stats cs
     ) cs join
     connection_stats csnext
     on csnext.client_id = cs.client_id and
        csnext.server_id = cs.server_id and
        csnext.timestamp = cs.nexttimestamp
where cs.status = 'ABORTED' and
      csnext.status = 'FAILED'

通过在connection_stats(client_id, server_id, timestamp)上建立索引,可以大大提高此类查询的性能。

答案 2 :(得分:0)

不太优雅,但应该有效。基于GROUP_CONCAT()

Demo

SELECT client_id,server_id,GROUP_CONCAT(status) as all_statuses
FROM   statuses
GROUP  BY client_id,server_id
HAVING all_statuses LIKE '%ABORTED,FAILED%'
ORDER  BY timestamp

答案 3 :(得分:0)

  

从表t1中选择*,表t2,其中t1.server_id = t2.server_id和   t1.status =&#39; ABORTED&#39;和t2 =&#39; FAILED&#39;

答案 4 :(得分:0)

您可以对状态进行分组,并可以根据顺序进行匹配

SELECT client_id,server_id,GROUP_CONCAT(status) as abort_fail
FROM   `table`    
GROUP  BY client_id,server_id
HAVING abort_fail ='ABORTED,FAILED'
ORDER  BY `timestamp` DESC

现在使用GROUP_CONCAT请记住,1000个字符有字符限制,所以你应该照顾它

答案 5 :(得分:0)

我没有要测试的MySQL数据库,但你可能会给这样的东西一个镜头。可能需要按列添加一些组。

SELECT aborted.*, failed.*
FROM connection_stats aborted
INNER JOIN connection_status nexterror ON aborted.client_id = nexterror.client_id AND nexterror.timestamp > aborted.timestamp
INNER JOIN connection_status failed ON aborted.client_id = failed.client_id AND failed.STATUS = 'FAILED' AND failed.timestamp = MIN(nexterror.timestamp)
WHERE aborted.STATUS = 'ABORTED'

答案 6 :(得分:0)

SELECT t0.clientid, t0.serverid
        , t0.logtime AS abort_time
        , t1.logtime AS fail_time
FROM tmp t0
JOIN tmp t1 ON t1.clientid = t0.clientid AND t1.serverid = t0.serverid
        -- t1 after t0
        AND t1.logtime > t0.logtime
WHERE t0. status = 'ABORTED'
AND t1. status = 'FAILED'
        -- no records inbetween 'aborted' and 'failed'
        -- (not even different 'aborted' and 'failed' records)
AND NOT EXISTS (
        SELECT *
        FROM tmp x
        WHERE x.clientid = t0.clientid AND x.serverid = t0.serverid
        AND x.logtime > t0.logtime
        AND x.logtime < t1.logtime
        )
        ;

更新:如果您要检索未加入的两个记录,但是作为单独记录,您可以执行以下操作:

SELECT t0.*
FROM tmp t0
JOIN (
        SELECT t1.clientid, t1.serverid
        , t1.logtime AS abort_time
        , t2.logtime AS fail_time
        FROM tmp t1
        JOIN tmp t2 ON t2.clientid = t1.clientid AND t2.serverid = t1.serverid
                -- t2 after t1
                AND t2.logtime > t1.logtime
        WHERE t1. status = 'ABORTED'
        AND t2. status = 'FAILED'
                -- no records inbetween 'aborted' and 'failed'
                -- (not even different 'aborted' and 'failed' records)
        AND NOT EXISTS (
                SELECT *
                FROM tmp x
                WHERE x.clientid = t1.clientid AND x.serverid = t1.serverid
                AND x.logtime > t1.logtime
                AND x.LOGTIME < t2.logtime
                )
        ) two ON two.clientid = t0.clientid AND two.serverid = t0.serverid
                AND (two.abort_time = t0.logtime OR two.fail_time = t0.logtime)
        ;

,或者同样重写为EXISTS子句,有时候会更清晰,因为t1,t2表不会泄漏到外部查询中:

SELECT *
FROM tmp t0
WHERE EXISTS (
        SELECT *
        FROM tmp t1
        JOIN tmp t2 ON t2.clientid = t1.clientid AND t2.serverid = t1.serverid
                -- t2 after t1
                AND t2.logtime > t1.logtime
        WHERE t1. status = 'ABORTED'
        AND t2. status = 'FAILED'
        AND t1.clientid = t0.clientid AND t1.serverid = t0.serverid
        AND t1.logtime = t0.logtime OR t2.logtime = t0.logtime
                -- no records inbetween 'aborted' and 'failed'
                -- (not even different 'aborted' and 'failed' records)
        AND NOT EXISTS (
                SELECT *
                FROM tmp x
                WHERE x.clientid = t1.clientid AND x.serverid = t1.serverid
                AND x.logtime > t1.logtime
                AND x.LOGTIME < t2.logtime
                )
                )
        ;