Question

我有一个帐户和IP地址列表，我正在尝试获取位置摘要。但是，计算对于我们的服务器来说太重了，我想知道是否有一种方法可以改变我的代码，我可以获得所有结果。帐户数据集大约为150k行和2列。

select city, state, count(*) from(
    select account_id, 256*256*256*one+256*256*two+256*three+four as Converted, city, state from
       (select *, convert(bigint, split_part(ip_address, '.', 1)) as one, convert(int, split_part(ip_address, '.', 2)) as two, 
       convert(int, split_part(ip_address, '.', 3)) as three, convert(int, split_part(ip_address, '.', 4)) as four from AccountsIP)
     inner join 
     (select city, state, ip_from, ip_to from ip_ranges a left join ip_locations b on a.ip_location_id = b.ip_location_id
      where country = 'US') b 
      on (256*256*256*one+256*256*two+256*three+four) between ip_from and ip_to
) 
group by city, state

Answer 1

您可以创建一个Python UDF将IP地址转换为bigint并在BETWEEN条件中使用它：

create or replace function ip_to_ipnum (ip varchar)
    returns bigint
    stable as $$
    ip_array = ip.split('.')
    return int(ip_array[0])*16777216+int(ip_array[1])*65536+int(ip_array[2])*256+int(ip_array[3])
$$ language plpythonu;

此外，瓶颈可能在您的ip_ranges和ip_locations表中，必须进行适当的排序。如果您的数据仅在美国，则可以删除所有其他数据，而不是过滤，并按(ip_from, ip_to)对表进行排序，以便查找更有效。

另外，由于ip_ranges和ip_locations中的数据不易变化，您可以创建一个包含这些数据的物理表，这样您就不必每次都在查询中加入它们上方。

将IP地址转换为位置，需要进行优化

1 个答案: