假设我有一个可能重叠或不重叠的IP范围列表(仅限上一个术语):
('1.1.1.1-7', '2.2.2.2-10', '3.3.3.3-3.3.3.3', '1.1.1.4-25', '2.2.2.4-6')
我正在寻找一种方法来识别任何重叠范围并将它们合并为单个范围。
('1.1.1.1-25', '2.2.2.2-10', '3.3.3.3-3')
当前的算法思路是将所有范围扩展为所有IP的列表,消除重复,排序和合并任何连续的IP。
还有更多python-esque算法建议吗?
答案 0 :(得分:6)
这是我的版本,作为一个模块。我的算法与他的答案中提到的一个lunixbochs相同,并且从范围字符串到整数和后面的转换很好地模块化。
import socket, struct
def ip2long(ip):
packed = socket.inet_aton(ip)
return struct.unpack("!L", packed)[0]
def long2ip(n):
unpacked = struct.pack('!L', n)
return socket.inet_ntoa(unpacked)
def expandrange(rng):
# expand '1.1.1.1-7' to ['1.1.1.1', '1.1.1.7']
start, end = [ip.split('.') for ip in rng.split('-')]
return map('.'.join, (start, start[:len(start) - len(end)] + end))
def compressrange((start, end)):
# compress ['1.1.1.1', '1.1.1.7'] to '1.1.1.1-7'
start, end = start.split('.'), end.split('.')
return '-'.join(map('.'.join,
(start, end[next((i for i in range(4) if start[i] != end[i]), 3):])))
def strings_to_ints(ranges):
# turn range strings into list of lists of ints
return [map(ip2long, rng) for rng in map(expandrange, ranges)]
def ints_to_strings(ranges):
# turn lists of lists of ints into range strings
return [compressrange(map(long2ip, rng)) for rng in ranges]
def consolodate(ranges):
# join overlapping ranges in a sorted iterable
iranges = iter(ranges)
startmin, startmax = next(iranges)
for endmin, endmax in iranges:
# leave out the '+ 1' if you want to join overlapping ranges
# but not consecutive ranges.
if endmin <= (startmax + 1):
startmax = max(startmax, endmax)
else:
yield startmin, startmax
startmin, startmax = endmin, endmax
yield startmin, startmax
def convert_consolodate(ranges):
# convert a list of possibly overlapping ip range strings
# to a sorted, consolodated list of non-overlapping ip range strings
return list(ints_to_strings(consolodate(sorted(strings_to_ints(ranges)))))
if __name__ == '__main__':
ranges = ('1.1.1.1-7',
'2.2.2.2-10',
'3.3.3.3-3.3.3.3',
'1.1.1.4-25',
'2.2.2.4-6')
print convert_consolodate(ranges)
# prints ['1.1.1.1-25', '2.2.2.2-10', '3.3.3.3-3']
答案 1 :(得分:1)
一旦我遇到同样的问题。唯一的区别是我必须有效地将线段保留在列表中。这是蒙特卡罗模拟。并且必须将新随机生成的线段添加到现有的已排序和合并的线段中。
我使用 lunixbochs 的答案将算法调整为您的问题,将IP转换为整数。
此解决方案允许将新IP范围添加到已合并范围的现有列表中(而其他解决方案依赖于已将合并范围列表排序,并且不允许将新范围添加到已合并范围列表)。它是在add_range
函数中完成的,它使用bisect
模块找到插入新IP范围的位置,然后删除冗余IP间隔并插入带有调整边界的新范围,以便新范围包含所有删除的范围。
import socket
import struct
import bisect
def ip2long(ip):
'''IP to integer'''
packed = socket.inet_aton(ip)
return struct.unpack("!L", packed)[0]
def long2ip(n):
'''integer to IP'''
unpacked = struct.pack('!L', n)
return socket.inet_ntoa(unpacked)
def get_ips(s):
'''Convert string IP interval to tuple with integer representations of boundary IPs
'1.1.1.1-7' -> (a,b)'''
s1,s2 = s.split('-')
if s2.isdigit():
s2 = s1[:-1] + s2
return (ip2long(s1),ip2long(s2))
def add_range(iv,R):
'''add new Range to already merged ranges inplace'''
left,right = get_ips(R)
#left,right are left and right boundaries of the Range respectively
#If this is the very first Range just add it to the list
if not iv:
iv.append((left,right))
return
#Searching the first interval with left_boundary < left range side
p = bisect.bisect_right(iv, (left,right)) #place after the needed interval
p -= 1 #calculating the number of interval basing on the position where the insertion is needed
#Interval: |----X----| (delete)
#Range: <--<--|----------| (extend)
#Detect if the left Range side is inside the found interval
if p >=0: #if p==-1 then there was no interval found
if iv[p][1]>= right:
#Detect if the Range is completely inside the interval
return #drop the Range; I think it will be a very common case
if iv[p][1] >= left-1:
left = iv[p][0] #extending the left Range interval
del iv[p] #deleting the interval from the interval list
p -= 1 #correcting index to keep the invariant
#Intervals: |----X----| |---X---| (delete)
#Range: |-----------------------------|
#Deleting all the intervals which are inside the Range interval
while True:
p += 1
if p >= len(iv) or iv[p][0] >= right or iv[p][1] > right:
'Stopping searching for the intervals which is inside the Range interval'
#there are no more intervals or
#the interval is to the right of the right Range side
# it's the next case (right Range side is inside the interval)
break
del iv[p] #delete the now redundant interval from the interval list
p -= 1 #correcting index to keep the invariant
#Interval: |--------X--------| (delete)
#Range: |-----------|-->--> (extend)
#Working the case when the right Range side is inside the interval
if p < len(iv) and iv[p][0] <= right-1:
#there is no condition for right interval side since
#this case would have already been worked in the previous block
right = iv[p][1] #extending the right Range side
del iv[p] #delete the now redundant interval from the interval list
#No p -= 1, so that p is no pointing to the beginning of the next interval
#which is the position of insertion
#Inserting the new interval to the list
iv.insert(p, (left,right))
def merge_ranges(ranges):
'''Merge the ranges'''
iv = []
for R in ranges:
add_range(iv,R)
return ['-'.join((long2ip(left),long2ip(right))) for left,right in iv]
ranges = ('1.1.1.1-7', '2.2.2.2-10', '3.3.3.3-3.3.3.3', '1.1.1.4-25', '2.2.2.4-6')
print(merge_ranges(ranges))
输出:
['1.1.1.1-1.1.1.25', '2.2.2.2-2.2.2.10', '3.3.3.3-3.3.3.3']
这对我来说代码很有趣!谢谢你:))
答案 2 :(得分:1)
将您的范围转换为数字对。这些函数将单个IP转换为整数值和从整数值转换。
def ip2long(ip):
packed = socket.inet_aton(ip)
return struct.unpack("!L", packed)[0]
def long2ip(n):
unpacked = struct.pack('!L', n)
return socket.inet_ntoa(unpacked)
现在,您可以将每个范围的边缘排序/合并为数字,然后转换回IP以获得良好的表示。关于merging time ranges的这个问题有一个很好的算法。
将1.1.1.1-1.1.1.2
和1.1.1.1-2
的字符串解析为一对数字。对于后一种格式,您可以这样做:
x = '1.1.1.1-2'
first, add = x.split('-')
second = first.rsplit('.', 1)[0] + '.' + add
pair = ip2long(first), ip2long(second)
使用简单的数字比较合并重叠范围。
转换回字符串表示形式(仍采用后一格式):
first, second = pair
first = long2ip(first) + '-' + long2ip(second).rsplit('.', 1)[1]
答案 3 :(得分:0)
统一你的ips格式,将范围转换为一对整数。
现在任务更简单 - “巩固”整数范围。我相信有很多现有的有效算法可以做到这一点,只有我的天真尝试:
>>> orig_ranges = [(1,5), (7,12), (2,3), (13,13), (13,17)] # should result in (1,5), (7,12), (13,17)
>>> temp_ranges = {}
>>> for r in orig_ranges:
temp_ranges.setdefault(r[0], []).append('+')
temp_ranges.setdefault(r[1], []).append('-')
>>> start_count = end_count = 0
>>> start = None
>>> for key in temp_ranges:
if start is None:
start = key
start_count += temp_ranges[key].count('+')
end_count += temp_ranges[key].count('-')
if start_count == end_count:
print start, key
start = None
start_count = end_count = 0
1 5
7 12
13 17
一般的想法是下一个:在我们将范围1放到另一个(在temp_ranges
dict中)之后,我们可以简单地通过计算原始范围的开始和结尾来找到新的组合范围;一旦我们获得平等,我们找到了一个统一的范围。
答案 4 :(得分:0)
如果你需要em,我有这些谎言,使用socket / struct可能是更好的方法
def ip_str_to_int(address):
"""Convert IP address in form X.X.X.X to an int.
>>> ip_str_to_int('74.125.229.64')
1249764672
"""
parts = address.split('.')
parts.reverse()
return sum(int(v) * 256 ** i for i, v in enumerate(parts))
def ip_int_to_str(address):
"""Convert IP address int into the form X.X.X.X.
>>> ip_int_to_str(1249764672)
'74.125.229.64'
"""
parts = [(address & 255 << 8 * i) >> 8 * i for i in range(4)]
parts.reverse()
return '.'.join(str(x) for x in parts)
答案 5 :(得分:0)