构造复杂查询以匹配日志文件中的行

时间:2017-12-06 12:48:52

标签: mysql

我在MySQL中有一个日志表,如下所示:

mysql> describe logtable;
+----------+------------------+------+-----+---------+-------+
| Field    | Type             | Null | Key | Default | Extra |
+----------+------------------+------+-----+---------+-------+
| id       | int(10) unsigned | NO   |     | NULL    |       |
| logdate  | datetime         | NO   |     | NULL    |       |
| host     | varchar(100)     | NO   |     | NULL    |       |
| action   | varchar(100)     | NO   |     | NULL    |       |
| user     | varchar(100)     | NO   |     | NULL    |       |
| org      | varchar(100)     | NO   |     | NULL    |       |
| location | varchar(1000)    | NO   |     | NULL    |       |
+----------+------------------+------+-----+---------+-------+

以下是其内容的示例(并非显示所有字段):

+----------+---------------------+--------+----------------------+----------+
|    id    |       logdate       | action |         user         | location |
+----------+---------------------+--------+----------------------+----------+
| 13933768 | 2017-01-03 08:42:25 | login  | user1@somewhere.com  | place1   |
| 13934110 | 2017-01-03 08:58:38 | login  | user2@somewhere.com  | place2   |
| 13935532 | 2017-01-03 11:02:31 | logout | user1@somewhere.com  | place1   |
| 13935622 | 2017-01-03 11:11:25 | logout | user2@somewhere.com  | place2   |
| 13935772 | 2017-01-03 11:27:27 | login  | user3@somewhere.com  | place3   |
| 13935942 | 2017-01-03 11:52:16 | login  | user4@somewhere.com  | place4   |
| 13936217 | 2017-01-03 12:25:08 | logout | user3@somewhere.com  | place3   |
| 13936293 | 2017-01-03 12:33:16 | logout | user4@somewhere.com  | place4   |
| 13937676 | 2017-01-03 15:33:59 | login  | user3@somewhere.com  | place5   |
| 13937859 | 2017-01-03 15:51:53 | logout | user3@somewhere.com  | place5   |
| 13942394 | 2017-01-04 08:31:26 | login  | user5@somewhere.com  | place2   |
| 13943946 | 2017-01-04 09:46:04 | login  | user4@somewhere.com  | place4   |
| 13944372 | 2017-01-04 10:17:25 | login  | user4@somewhere.com  | place6   |
| 13944373 | 2017-01-04 10:17:27 | login  | user4@somewhere.com  | place6   |
| 13944374 | 2017-01-04 10:17:29 | login  | user4@somewhere.com  | place6   |
| 13944375 | 2017-01-04 10:19:22 | login  | user4@somewhere.com  | place4   |
| 13944575 | 2017-01-04 10:36:48 | login  | user4@somewhere.com  | place6   |
| 13946830 | 2017-01-04 14:56:36 | login  | user6@somewhere.com  | place7   |
| 13947791 | 2017-01-04 16:41:26 | logout | user5@somewhere.com  | place2   |
| 13947795 | 2017-01-04 16:41:59 | login  | user4@somewhere.com  | place4   |
| 13948181 | 2017-01-04 17:19:19 | logout | user4@somewhere.com  | place7   |
| 13948200 | 2017-01-04 17:22:18 | logout | user4@somewhere.com  | place4   |
| 13948201 | 2017-01-04 17:22:18 | logout | user4@somewhere.com  | place6   |
| 13948824 | 2017-01-04 20:23:15 | login  | user7@somewhere.com  | place8   |
| 13948870 | 2017-01-04 20:44:42 | logout | user7@somewhere.com  | place8   |
| 13949945 | 2017-01-05 02:26:35 | logout | user6@somewhere.com  | place7   |
| 13951697 | 2017-01-05 08:49:37 | login  | user8@somewhere.com  | place6   |
| 13951863 | 2017-01-05 08:56:37 | login  | user9@somewhere.com  | place9   |
| 13951886 | 2017-01-05 08:57:06 | login  | user10@somewhere.com | place9   |
+----------+---------------------+--------+----------------------+----------+

我想制作一张表格,显示每位用户登录各个地点的累积时间。从理论上讲,每个人的loginlogout条目&地方组合应该是成对的,但客户故障和网络奇怪等各种事情有时意味着不匹配。在退出第一个位置之前,也可以登录到第二个位置,因此这些对可以相互重叠。此外,这些对可能不符合以下方式之一:

  • login事件,但未匹配logout
  • 没有匹配logout的{​​{1}}事件(听起来很奇怪,但软件会记录这样的中断登录
  • 多个login个事件(重试),只有一个login

到目前为止,我已经在logoutuser加入表格,并从location次减去logout次。我还确保login事件的ID高于logout事件,因为它们总是在增加。但由于可能有多个条目,我得到所有组合。举个简单的例子:

示例1

login

我天真的做法:

+----------+---------------------+--------+----------------------+----------+
|    id    |       logdate       | action |         user         | location |
+----------+---------------------+--------+----------------------+----------+
| 1        | 2017-01-03 08:42:25 | login  | user1@somewhere.com  | place1   |
| 2        | 2017-01-03 11:02:31 | logout | user1@somewhere.com  | place1   |
| 3        | 2017-01-03 11:27:27 | login  | user1@somewhere.com  | place1   |
| 4        | 2017-01-03 12:25:08 | logout | user1@somewhere.com  | place1   |
+----------+---------------------+--------+----------------------+----------+

给出了3个结果:ids 1→2,1→4和3→4。因为它可能看起来像这样:

示例2

select * from logtable as t1
join logtable as t2
on t1.user = t2.user and t1.location = t2.location
and t1.action = 'login' and t2.action = 'logout'
and t2.id > t1.id

在成功之前登录3次尝试,然后注销,我应该只获得1个结果而不是3,并且所需答案是id 3和id 4之间的差异。

我可以描述我所追求的内容,但不能将其转换为SQL - 至少没有多个嵌套的SELECT只能运行几个小时(日志文件是8500行)。

  1. 对于每个匹配的+----------+---------------------+--------+----------------------+----------+ | id | logdate | action | user | location | +----------+---------------------+--------+----------------------+----------+ | 1 | 2017-01-03 08:42:25 | login | user1@somewhere.com | place1 | | 2 | 2017-01-03 08:43:35 | login | user1@somewhere.com | place1 | | 3 | 2017-01-03 08:44:45 | login | user1@somewhere.com | place1 | | 4 | 2017-01-03 12:25:08 | logout | user1@somewhere.com | place1 | +----------+---------------------+--------+----------------------+----------+ user,找到任何登录的 last 以及退出 first > 登录后。
  2. 计算这两个事件之间的时差
  3. 忽略没有匹配注销的登录,反之亦然
  4. 汇总每位用户的累计时间
  5. 在上面的示例1中,应该给出2个'登录/注销'事件(ids 1→2 = 2:20:06和3→4 = 0:57:41),总计为1输出行= 3:17: 47,对于例2,它应该给出一个'事件'(ids 3→4 = 3:40:23)总计为1行。

1 个答案:

答案 0 :(得分:0)

这应该可以解决问题:

SELECT MIN(logdate_logout) AS logdate_logout,logdate_login,user,location,id_login,id_logout FROM ( SELECT MAX(logdate_login) AS logdate_login, logdate_logout,user,location,id_login,id_logout FROM ( SELECT t1.id AS id_login, t1.logdate AS logdate_login, t1.user, t1.location, t2.id AS id_logout, t2.logdate AS logdate_logout FROM logtable AS t1 JOIN logtable AS t2 ON (t1.user = t2.user AND t1.location = t2.location AND t1.action = 'login' AND t2.action = 'logout' AND t2.id > t1.id) ORDER BY t1.logdate,t2.logdate ) results_naive_approach GROUP BY logdate_logout,user,location ) inner_query GROUP BY logdate_login,user,location

这将确保您只有匹配的注销之前的登录以及匹配登录之后的注销