使用SQL查询wifi日志用户事件创建会话ID

时间:2018-12-10 07:12:36

标签: sql amazon-athena presto

朋友,

该场景与wifi控制器日志文件有关,该文件跟踪用户活动,例如当用户“加入”,“进入”,从“断开”和“离开”到wifi网络时,我已经清理并上传了结构化数据我想在用户级别为他/她的活动创建会话ID。确定一个会话可以按事件“ joins”和“ leave”之间的所有事件进行分组。为便于理解,我附加了示例数据。 / p>

一些对创建会话ID有用的指针:

  1. 可以通过“ joins”事件来标识用户的新会话,并以“ leave”事件来表示结束。
  2. “与...断开连接”和“离开”具有相同的时间戳,因此,如果按时间戳排序,则顺序可以是随机的。
  3. 对于每个“断开连接”,都有一个“离开”条目,反之亦然。因此,我们可以有“离开”记录而没有“与记录断开连接”。
  4. “ in”表示漫游,因此它应属于同一会话。它将在“加入”和“离开”事件之间。
  5. 您可能会看到记录中只有“输入中”或说“与...断开连接”的记录仅是因为没有完整的日志文件,您仍然希望将它们标记为不同的会话ID。

请帮助我考虑给定的指针来创建会话ID。下面是示例数据,其中包含一些可能的场景,以更好地了解我要查找的内容。 “必需的数据”已分配了虚拟会话ID,这就是它的外观。张贴我的进一步澄清。我想使用SQL来实现。

场景1

原始数据:

usermac datetime event

xx:xx:xx:xx:0e:72   2018-04-19 09:30:58.000 joins
xx:xx:xx:xx:0e:72   2018-04-19 09:51:54.000 in
xx:xx:xx:xx:0e:72   2018-04-19 09:51:54.000 in
xx:xx:xx:xx:0e:72   2018-04-19 12:01:39.000 disconnects from
xx:xx:xx:xx:0e:72   2018-04-19 12:01:39.000 leave
xx:xx:xx:xx:0e:72   2018-04-19 12:46:05.000 joins
xx:xx:xx:xx:0e:72   2018-04-19 15:27:28.000 disconnects from
xx:xx:xx:xx:0e:72   2018-04-19 15:27:28.000 leave

所需格式:

usermac datetime event sessionid

xx:xx:xx:xx:0e:72   2018-04-19 09:30:58.000 joins               1
xx:xx:xx:xx:0e:72   2018-04-19 09:51:54.000 in                  1
xx:xx:xx:xx:0e:72   2018-04-19 09:51:54.000 in                  1
xx:xx:xx:xx:0e:72   2018-04-19 12:01:39.000 disconnects from    1
xx:xx:xx:xx:0e:72   2018-04-19 12:01:39.000 leave               1
xx:xx:xx:xx:0e:72   2018-04-19 12:46:05.000 joins               2
xx:xx:xx:xx:0e:72   2018-04-19 15:27:28.000 disconnects from    2
xx:xx:xx:xx:0e:72   2018-04-19 15:27:28.000 leave               2

场景2

原始数据:

usermac datetime event

xx:xx:xx:xx:xx:80   2018-04-12 21:35:48.000 joins
xx:xx:xx:xx:xx:80   2018-04-12 21:40:01.000 leave
xx:xx:xx:xx:xx:80   2018-04-12 21:40:01.000 disconnects from
xx:xx:xx:xx:xx:80   2018-04-12 21:40:08.000 joins
xx:xx:xx:xx:xx:80   2018-04-12 21:43:45.000 leave
xx:xx:xx:xx:xx:80   2018-04-12 21:43:45.000 disconnects from
xx:xx:xx:xx:xx:80   2018-04-12 21:44:00.000 joins
xx:xx:xx:xx:xx:80   2018-04-12 22:01:29.000 leave

所需格式:

usermac datetime event sessionid

xx:xx:xx:xx:xx:80   2018-04-12 21:35:48.000 joins               3
xx:xx:xx:xx:xx:80   2018-04-12 21:40:01.000 leave               3
xx:xx:xx:xx:xx:80   2018-04-12 21:40:01.000 disconnects from    3
xx:xx:xx:xx:xx:80   2018-04-12 21:40:08.000 joins               4
xx:xx:xx:xx:xx:80   2018-04-12 21:43:45.000 leave               4
xx:xx:xx:xx:xx:80   2018-04-12 21:43:45.000 disconnects from    4
xx:xx:xx:xx:xx:80   2018-04-12 21:44:00.000 joins               5
xx:xx:xx:xx:xx:80   2018-04-12 22:01:29.000 leave               5

场景3

原始数据:

usermac datetime event

xx:xx:xx:xx:xx:92   2018-04-23 09:15:18.000 joins               
xx:xx:xx:xx:xx:92   2018-04-23 11:02:30.000 in                  
xx:xx:xx:xx:xx:92   2018-04-23 11:02:30.000 in                  
xx:xx:xx:xx:xx:92   2018-04-23 11:06:09.000 disconnects from    
xx:xx:xx:xx:xx:92   2018-04-23 11:06:09.000 leave               
xx:xx:xx:xx:xx:92   2018-04-23 11:07:03.000 joins               
xx:xx:xx:xx:xx:92   2018-04-23 13:35:51.000 leave       

所需格式:

usermac datetime event sessionid

xx:xx:xx:xx:xx:92   2018-04-23 09:15:18.000 joins               6
xx:xx:xx:xx:xx:92   2018-04-23 11:02:30.000 in                  6
xx:xx:xx:xx:xx:92   2018-04-23 11:02:30.000 in                  6
xx:xx:xx:xx:xx:92   2018-04-23 11:06:09.000 disconnects from    6
xx:xx:xx:xx:xx:92   2018-04-23 11:06:09.000 leave               6
xx:xx:xx:xx:xx:92   2018-04-23 11:07:03.000 joins               7
xx:xx:xx:xx:xx:92   2018-04-23 13:35:51.000 leave               7

场景4

原始数据:

usermac datetime event

xx:xx:xx:xx:xx:5f   2018-06-23 07:03:24.000 disconnects from
xx:xx:xx:xx:xx:5f   2018-06-23 07:03:24.000 leave
xx:xx:xx:xx:xx:5f   2018-06-23 07:05:45.000 joins
xx:xx:xx:xx:xx:5f   2018-06-23 07:05:48.000 leave
xx:xx:xx:xx:xx:5f   2018-06-23 07:05:48.000 disconnects from
xx:xx:xx:xx:xx:5f   2018-06-23 07:31:59.000 joins
xx:xx:xx:xx:xx:5f   2018-06-23 07:32:02.000 leave
xx:xx:xx:xx:xx:5f   2018-06-23 07:32:02.000 disconnects from

所需格式:

usermac datetime event sessionid

xx:xx:xx:xx:xx:5f   2018-06-23 07:03:24.000 disconnects from    8
xx:xx:xx:xx:xx:5f   2018-06-23 07:03:24.000 leave               8
xx:xx:xx:xx:xx:5f   2018-06-23 07:05:45.000 joins               9
xx:xx:xx:xx:xx:5f   2018-06-23 07:05:48.000 leave               9
xx:xx:xx:xx:xx:5f   2018-06-23 07:05:48.000 disconnects from    9
xx:xx:xx:xx:xx:5f   2018-06-23 07:31:59.000 joins               10
xx:xx:xx:xx:xx:5f   2018-06-23 07:32:02.000 leave               10
xx:xx:xx:xx:xx:5f   2018-06-23 07:32:02.000 disconnects from    10

场景5

原始数据:

usermac datetime event

xx:xx:xx:xx:xx:b2   2018-06-12 07:22:14.000 in          
xx:xx:xx:xx:xx:b2   2018-06-12 07:22:14.000 in
xx:xx:xx:xx:xx:b2   2018-06-12 07:22:40.000 in
xx:xx:xx:xx:xx:b2   2018-06-12 07:22:40.000 in
xx:xx:xx:xx:xx:b2   2018-06-12 07:23:39.000 leave

所需格式:

usermac datetime event sessionid

xx:xx:xx:xx:xx:b2   2018-06-12 07:22:14.000 in               11
xx:xx:xx:xx:xx:b2   2018-06-12 07:22:14.000 in               11
xx:xx:xx:xx:xx:b2   2018-06-12 07:22:40.000 in               11
xx:xx:xx:xx:xx:b2   2018-06-12 07:22:40.000 in               11
xx:xx:xx:xx:xx:b2   2018-06-12 07:23:39.000 leave            11

场景6

原始数据:

usermac datetime event

xx:xx:xx:xx:xx:16   2018-06-23 07:55:26.000 in
xx:xx:xx:xx:xx:16   2018-06-23 07:55:26.000 in

所需格式:

usermac datetime event sessionid

xx:xx:xx:xx:xx:16   2018-06-23 07:55:26.000 in               12
xx:xx:xx:xx:xx:16   2018-06-23 07:55:26.000 in               12

场景7

原始数据:

usermac datetime event

xx:xx:xx:xx:xx:m6   2018-06-24 06:56:30.000 rejoins
xx:xx:xx:xx:xx:m6   2018-06-24 06:56:51.000 rejoins
xx:xx:xx:xx:xx:m6   2018-06-24 06:56:59.000 leave
xx:xx:xx:xx:xx:m6   2018-06-24 06:56:59.000 disconnects from
xx:xx:xx:xx:xx:m6   2018-06-24 06:57:03.000 joins

所需格式:

usermac datetime event sessionid

xx:xx:xx:xx:xx:m6   2018-06-24 06:56:30.000 rejoins           13
xx:xx:xx:xx:xx:m6   2018-06-24 06:56:51.000 rejoins           13
xx:xx:xx:xx:xx:m6   2018-06-24 06:56:59.000 leave             13
xx:xx:xx:xx:xx:m6   2018-06-24 06:56:59.000 disconnects from  13
xx:xx:xx:xx:xx:m6   2018-06-24 06:57:03.000 joins             14

到现在为止一直在尝试:

with cte1 as ( SELECT row_number() over(partition by usermac ORDER BY datetime) indexno, row_number() over(partition by event ORDER BY usermac, datetime) sessionid, datetime, usermac, event from t8_wfc_log_apr_jun_syslog_cast where usermac='xx:xx:xx:xx:16' order by usermac, datetime ) SELECT case when event = 'joins' then sessionid else (select max(sessionid) from cte1 innie where innie.usermac = oute.usermac and innie.indexno < oute.indexno and innie.event = 'joins') end sessionid, datetime, usermac, event FROM cte1 oute order by usermac, datetime;

0 个答案:

没有答案