计算24小时范围内的请求数以确定用户会话?

时间:2019-10-04 15:33:09

标签: kusto kusto-query-language

想象一下一个Web服务器日志,其中包含以下行:

<timestamp> <ip> <user-agent> <product page>

我想要一份报告

  • 使用以下条件计算24小时内每个用户会话对产品页面的请求数量:
  • 唯一用户定义为多个列()的组合
  • 24h窗口从首次请求产品页面的时间戳开始(24h窗口可以在任何时间开始)
  • 如果请求的时间戳之间间隔了24小时,则将其视为新的用户会话

对于以下日志:

2019-1-1 01:00 1.2.3.4 Netscape product 5
2019-1-1 01:01 1.2.3.4 Netscape product 5
2019-1-1 01:00 1.2.3.5 Chrome product 5
2019-1-1 01:01 1.2.3.5 Chrome product 5
2019-1-1 01:59 1.2.3.4 Netscape product 5
2019-1-1 02:00 1.2.3.4 Netscape product 4
2019-1-1 02:01 1.2.3.4 Netscape product 4
2019-1-1 02:02 1.2.3.4 Netscape product 4
2019-1-1 07:43 1.2.3.5 Chrome product 5
2019-1-2  2:01 1.2.3.4 Netscape product 5

会产生:

1.2.3.4/Netscape, product 4, 1
1.2.3.4/Netscape, product 5, 2
1.2.3.5/Chrome: product 5, 1

,也许第二个查询将输出:

1.2.3.4/Netscape, 6
1.2.3.4/Netscape, 1
1.2.3.5/Chrome, 3

(每个用户24小时窗口的请求数,因此两次列出了1.2.3.4/Netscape)

可以同时提供上述两个结果集的示例查询是什么?

奖金/可选:如果24小时内的请求之间的间隔时间超过30m,则会被视为另一个新会话

1 个答案:

答案 0 :(得分:0)

这可能会为您提供指导(尽管不一定要表现得好/效率高,这取决于输入数据集的大小)。

datatable(timestamp:datetime, ip:string, user_agent:string, product_page:string)
[
    datetime(2019-01-01 01:00), '1.2.3.4', 'Netscape', 'product 5',
    datetime(2019-01-01 01:01), '1.2.3.4', 'Netscape', 'product 5',
    datetime(2019-01-01 01:00), '1.2.3.5', 'Chrome',   'product 5',
    datetime(2019-01-01 01:01), '1.2.3.5', 'Chrome',   'product 5',
    datetime(2019-01-01 01:59), '1.2.3.4', 'Netscape', 'product 5',
    datetime(2019-01-01 02:00), '1.2.3.4', 'Netscape', 'product 4',
    datetime(2019-01-01 02:01), '1.2.3.4', 'Netscape', 'product 4',
    datetime(2019-01-01 02:02), '1.2.3.4', 'Netscape', 'product 4',
    datetime(2019-01-01 07:43), '1.2.3.5', 'Chrome',   'product 5',
    datetime(2019-01-02 02:01), '1.2.3.4', 'Netscape', 'product 5',
]
| extend user = strcat(ip, "/", user_agent)
| order by user asc, timestamp asc
| extend session_start = row_window_session(timestamp, 24h, 24h, user_agent != prev(user_agent) or product_page != prev(product_page) or ip != prev(ip))
| summarize session_count = dcount(session_start) by user, product_page

->

| user             | product_page | session_count |
|------------------|--------------|---------------|
| 1.2.3.4/Netscape | product 5    | 2             |
| 1.2.3.4/Netscape | product 4    | 1             |
| 1.2.3.5/Chrome   | product 5    | 1             |

对于第二个查询,可以进行以下操作:

datatable(timestamp:datetime, ip:string, user_agent:string, product_page:string)
[
    datetime(2019-01-01 01:00), '1.2.3.4', 'Netscape', 'product 5',
    datetime(2019-01-01 01:01), '1.2.3.4', 'Netscape', 'product 5',
    datetime(2019-01-01 01:00), '1.2.3.5', 'Chrome',   'product 5',
    datetime(2019-01-01 01:01), '1.2.3.5', 'Chrome',   'product 5',
    datetime(2019-01-01 01:59), '1.2.3.4', 'Netscape', 'product 5',
    datetime(2019-01-01 02:00), '1.2.3.4', 'Netscape', 'product 4',
    datetime(2019-01-01 02:01), '1.2.3.4', 'Netscape', 'product 4',
    datetime(2019-01-01 02:02), '1.2.3.4', 'Netscape', 'product 4',
    datetime(2019-01-01 07:43), '1.2.3.5', 'Chrome',   'product 5',
    datetime(2019-01-02 02:01), '1.2.3.4', 'Netscape', 'product 5',
]
| extend user = strcat(ip, "/", user_agent)
| summarize count() by user, startofday(timestamp)
| project-away timestamp

->

| user             | count_ |
|------------------|--------|
| 1.2.3.4/Netscape | 6      |
| 1.2.3.5/Chrome   | 3      |
| 1.2.3.4/Netscape | 1      |