我有一个像这样的大型数据库(运营商名称,IP):
+-------+----------------+
|Name |IP |
+-------+----------------+
|A |41.74.63.255 |
+-------+----------------+
|B |168.167.255.255 |
+-------+----------------+
+ ... | ... |
我想基于数据库 IE中的出现频率自动为运营商构建“有效范围”列表:
Operator "A" -> [(range A), (range B), (range C)...]
Operator "B" -> [(range A)...]
某种群集,然后检测来自运营商A的新IP是否属于有效群集,还是异常群。
什么是开始的好地方?
答案 0 :(得分:1)
可能有两种方法。
或者更确切地说排序然后合并。直接的方法,易于理解和实施,但可能是缓慢而低效的。
SRC_LIST = (operator, ip) // source list of (operator, IP)
SORTED_IPS[] = (ip) // map of per-operator lists of sorted IPs
RANGES[] = (startIP, endIP) // map of per-operator lists of ranges
// sort IPs to per-operator lists of IPs
for E in SRC_LIST
SortInsertIP(SORTED_IPS[E.operator], E.ip)
// merge continuous IPs into ranges
for OP in SORTED_IPS
for IP in SORTED_IPS[OP]
MergeIP(RANGES[OP], IP)
// sort merged lists based on their appearance frequency
这可能非常有效和快速,但是几乎没有先决条件:
.255
或.0
operators
的数量是有限的,理想情况下最多为255以适合一个字节。如果这些是真的,我们可能只使用IPv4的前3个字节作为operators
的2 ^ 24表的索引。然后我们只合并连续的索引。
SRC_LIST = (operator, ip) // source list of (operator, IP)
OPERATORS[] - (idx) // map each operator to an index starting form 1
TBL24[2^24] = (op_idx) // table of 2^24 operators, where 0 is unused entry
RANGES[] = (startIP, endIP) // map of per-operator lists of ranges
// map DB to TBL24 table
for E in SRC_LIST
op_idx = GetOperatorIndex(OPERATORS[], E.operator)
ip_idx = uint32(E.ip) >> 8 // convert 32-bit IPv4 to 24-bit index
TBL24[ip_idx] = op_idx
// find consecutive operators in the map
startIdx = 0
while startIdx < 2^24
endIdx = startIdx + 1
while TBL24[startIdx] == DIR[endIdx]
endIdx = endIdx + 1
if endIdx == 2^24
break
// append found range to the per-operator list
if TBL24[startIdx] != 0 // i.e. non-empty
AppendList(RANGES[TBL24[startIdx]], (startIdx << 24, endIdx << 24))
startIdx = endIdx + 1
// sort merged lists based on their appearance frequency
有一种方法不仅可以使用/ 24前缀,还可以使用更具体的前缀。请查看Routing Lookups in Hardware at Memory Access Speeds论文或DPDK中的software implementation of DIR-24-8 algorithm。