使用SQL

时间:2018-10-06 07:59:05

标签: sql google-bigquery mapping recursive-query

我的日志表中有这样的数据

====================
| src_ip | dest_ip |
====================
| ip01_1 | ip01_2  |
| ip01_1 | ip01_3  |
| ip01_2 | ip01_4  |
| ip01_4 | ip01_5  |
| ip02_1 | ip02_2  |
| ip02_2 | ip02_3  |
====================

我所需的输出是一个包含dest_ip和第一个请求ip的表。

例如,
* ip01_4dest_ip)以ip01_1作为其first_src_ipip01_1 -> ip01_2 -> ip01_4
* {ip01_5dest_ip)以ip01_1作为其first_src_ipip01_1 -> ip01_2 -> ip01_4 -> ip01_5

有什么方法可以使用SQL查询创建如下表?

==========================
| first_src_ip | dest_ip |
==========================
| ip01_1       | ip01_2  |
| ip01_1       | ip01_3  |
| ip01_1       | ip01_4  |
| ip01_1       | ip01_5  |
| ip02_1       | ip02_2  |
| ip02_1       | ip02_3  |
==========================

我正在考虑使用自连接,但是连接时间无法固定。

1 个答案:

答案 0 :(得分:0)

这里是一个示例,它支持节点之间的三个隔离级别,例如ip1 -> ip2, ip2 -> ip3, ip3 -> ip4

WITH IPs AS (
  SELECT 'ip01_1' AS src_ip, 'ip01_2' AS dest_ip UNION ALL
  SELECT 'ip01_1', 'ip01_3' UNION ALL
  SELECT 'ip01_2', 'ip01_4' UNION ALL
  SELECT 'ip01_4', 'ip01_5' UNION ALL
  SELECT 'ip02_1', 'ip02_2' UNION ALL
  SELECT 'ip02_2', 'ip02_3'
), Hop1 AS (
  SELECT
    COALESCE(
      (SELECT MIN(ip2.src_ip) FROM IPs AS ip2
       WHERE ip.src_ip = ip2.dest_ip),
      src_ip
    ) AS src_ip,
    dest_ip
  FROM IPs AS ip
), Hop2 AS (
  SELECT
    COALESCE(
      (SELECT MIN(ip2.src_ip) FROM IPs AS ip2
       WHERE ip.src_ip = ip2.dest_ip),
      src_ip
    ) AS src_ip,
    dest_ip
  FROM Hop1 AS ip
)
SELECT *
FROM Hop2
ORDER BY src_ip;

每个CTE都在原始IP地址映射中寻找当前src_ip与另一个dest_ip之间的关联。