分组相似的URL

时间:2019-06-21 04:43:50

标签: mysql sql google-bigquery

我希望获取对xmlrpc.php和wp-login.php的所有请求,并在语句中使用通配符。

但是这带来了一个问题,因为它不只在xmlrpc和wp-login的两行中输出数据,而且还包括附带查询的URL。希望它包含所有请求的URL,但将它们组合起来仅显示为xmlrpc.php或wp-login.php

我是mysql n00b,正在使用substr replace和group_concat,但无法正常工作。

WITH 
  subq AS (
    SELECT url, COUNT(url) AS count
    FROM `flywheel-production.fastly_logs.ingress_logs`
    WHERE timestamp > TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -1 DAY) 
  AND (url LIKE "/wp-login.php%" OR  url LIKE "/xmlrpc.php%")
  AND site_hash = "btmpuroizf"
    GROUP BY url
  )

SELECT 
  url,
  count,
  ROUND(count / (SELECT SUM(count) FROM subq) * 100, 2) AS percent
FROM subq
ORDER BY count DESC

任何帮助将不胜感激。谢谢!

1 个答案:

答案 0 :(得分:0)

对于BigQuery标准SQL

以下调整后的查询应执行“技巧”

#standardSQL
WITH subq AS (
  SELECT REGEXP_EXTRACT(url, r'(.*?)(?:\?|$)') url, COUNT(url) AS COUNT
  FROM `flywheel-production.fastly_logs.ingress_logs`
  WHERE timestamp > TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -1 DAY) 
  AND (url LIKE "/wp-login.php%" OR  url LIKE "/xmlrpc.php%")
  AND site_hash = "btmpuroizf"
  GROUP BY url
)
SELECT 
  url,
  COUNT,
  ROUND(COUNT / (SELECT SUM(COUNT) FROM subq) * 100, 2) AS percent
FROM subq
ORDER BY COUNT DESC