Question

我有2个熊猫数据框。

“ DF1”是来自日志的流量，其中小时，时间和目标地址包含大约。 200万行。
“ DF2”是ASN信息，我在其中获取了起始地址和结束地址，并将其转换为CIDR表示法（大约为）。 800,000行。

我试图在“ DF2”的CIDR字段中从“ DF1”查找地址，然后将与CIDR块关联的ASN编号添加到“ DF1”。由于我有这么大的数据集，所以我很好地想做到这一点？

DF1
    address 
0   8.47.124.1  
1   63.215.97.2 
2   8.47.124.2  
3   63.215.97.1 
4   8.47.124.1
5   8.47.124.2  
6   8.47.124.1  
7   8.47.124.1  

DF2
   ASN   CIDR
0   1    [IPNetwork('8.47.124.0/29')]
1   1    [IPNetwork('8.45.244.0/29')]
2   2    [IPNetwork('63.215.97.8/29')]
3   1    [IPNetwork('8.13.232.64/27')]
4   2    [IPNetwork('63.215.97.16/29')]
5   2    [IPNetwork('63.215.97.24/29')]
6   1    [IPNetwork('8.13.228.128/27')]
7   1    [IPNetwork('8.13.228.96/27')]

所需的输出：

DF1
   address         asn
0   8.47.124.1     1
1   63.215.97.2    2
2   8.47.124.2     1
3   63.215.97.1    2
4   8.47.124.1     1
5   8.47.124.2     1
6   8.47.124.1     1
7   8.47.124.1     1

我越来越近：

import ipaddress

#Create new column "ASN" to DF1
DF1["ASN"] = ""

#While loop uses the library ipaddress that checks if address from "DF1" is in CIDR block of "DF2"
index = 0  
while(index < lenth):
  DF1["sourceaddress"].iloc[index] in DF2["CIDR"].iloc[index]

  DF1["ASN"].iloc[index] = DF2["ASN"].iloc[index]
  index = index + 1

但这只是给我ASN一路下跌的机会。我认为这只是给我DF2中头寸的ASN，而不是当DF1中的IP与DF2中的CIDR匹配时的ASN。

Answer 1

考虑从第一个数字到最后一个期间构建IP地址的子字符串，然后合并在一起：

DF1['IP_Sub'] = DF1['address'].apply(lambda x: x[0:x.rindex('.')])

DF2['IP_Sub'] = DF2['CIDR'].apply(lambda x: x[12:x.rindex('.')])
DF2 = DF2[['IP_Sub', 'ASN']].drop_duplicates()

# MERGE DFs
DF3 = pd.merge(DF1, DF2, on='IP_Sub')[['address', 'ASN']]

print(DF3)
#        address  ASN
# 0   8.47.124.1    1
# 1   8.47.124.2    1
# 2   8.47.124.1    1
# 3   8.47.124.2    1
# 4   8.47.124.1    1
# 5   8.47.124.1    1
# 6  63.215.97.2    2
# 7  63.215.97.1    2

# MERGE DFs (MAINTAIN ORIGINAL INDEX)
DF3 = (DF1.reset_index()
          .merge(DF2, on='IP_Sub', sort=False)
          .filter(['index', 'address', 'ASN'])
          .set_index('index').sort_index()
          .rename_axis(None))
print(DF3)
#        address  ASN
# 0   8.47.124.1    1
# 1  63.215.97.2    2
# 2   8.47.124.2    1
# 3  63.215.97.1    2
# 4   8.47.124.1    1
# 5   8.47.124.2    1
# 6   8.47.124.1    1
# 7   8.47.124.1    1

使用IP地址库的while循环问题查找IP

1 个答案: