使用IP地址库的while循环问题查找IP

时间:2018-07-31 19:21:10

标签: python pandas dataframe

我有2个熊猫数据框。

  • “ DF1”是来自日志的流量,其中小时,时间和目标地址包含大约。 200万行。
  • “ DF2”是ASN信息,我在其中获取了起始地址和结束地址,并将其转换为CIDR表示法(大约为)。 800,000行。

我试图在“ DF2”的CIDR字段中从“ DF1”查找地址,然后将与CIDR块关联的ASN编号添加到“ DF1”。由于我有这么大的数据集,所以我很好地想做到这一点?

DF1
    address 
0   8.47.124.1  
1   63.215.97.2 
2   8.47.124.2  
3   63.215.97.1 
4   8.47.124.1
5   8.47.124.2  
6   8.47.124.1  
7   8.47.124.1  

DF2
   ASN   CIDR
0   1    [IPNetwork('8.47.124.0/29')]
1   1    [IPNetwork('8.45.244.0/29')]
2   2    [IPNetwork('63.215.97.8/29')]
3   1    [IPNetwork('8.13.232.64/27')]
4   2    [IPNetwork('63.215.97.16/29')]
5   2    [IPNetwork('63.215.97.24/29')]
6   1    [IPNetwork('8.13.228.128/27')]
7   1    [IPNetwork('8.13.228.96/27')]

所需的输出:

DF1
   address         asn
0   8.47.124.1     1
1   63.215.97.2    2
2   8.47.124.2     1
3   63.215.97.1    2
4   8.47.124.1     1
5   8.47.124.2     1
6   8.47.124.1     1
7   8.47.124.1     1

我越来越近:

import ipaddress

#Create new column "ASN" to DF1
DF1["ASN"] = ""

#While loop uses the library ipaddress that checks if address from "DF1" is in CIDR block of "DF2"
index = 0  
while(index < lenth):
  DF1["sourceaddress"].iloc[index] in DF2["CIDR"].iloc[index]

  DF1["ASN"].iloc[index] = DF2["ASN"].iloc[index]
  index = index + 1

但这只是给我ASN一路下跌的机会。我认为这只是给我DF2中头寸的ASN,而不是当DF1中的IP与DF2中的CIDR匹配时的ASN。

1 个答案:

答案 0 :(得分:1)

考虑从第一个数字到最后一个期间构建IP地址的子字符串,然后合并在一起:

DF1['IP_Sub'] = DF1['address'].apply(lambda x: x[0:x.rindex('.')])

DF2['IP_Sub'] = DF2['CIDR'].apply(lambda x: x[12:x.rindex('.')])
DF2 = DF2[['IP_Sub', 'ASN']].drop_duplicates()

# MERGE DFs
DF3 = pd.merge(DF1, DF2, on='IP_Sub')[['address', 'ASN']]

print(DF3)
#        address  ASN
# 0   8.47.124.1    1
# 1   8.47.124.2    1
# 2   8.47.124.1    1
# 3   8.47.124.2    1
# 4   8.47.124.1    1
# 5   8.47.124.1    1
# 6  63.215.97.2    2
# 7  63.215.97.1    2

# MERGE DFs (MAINTAIN ORIGINAL INDEX)
DF3 = (DF1.reset_index()
          .merge(DF2, on='IP_Sub', sort=False)
          .filter(['index', 'address', 'ASN'])
          .set_index('index').sort_index()
          .rename_axis(None))
print(DF3)
#        address  ASN
# 0   8.47.124.1    1
# 1  63.215.97.2    2
# 2   8.47.124.2    1
# 3  63.215.97.1    2
# 4   8.47.124.1    1
# 5   8.47.124.2    1
# 6   8.47.124.1    1
# 7   8.47.124.1    1