我有2个熊猫数据框。
我试图在“ DF2”的CIDR字段中从“ DF1”查找地址,然后将与CIDR块关联的ASN编号添加到“ DF1”。由于我有这么大的数据集,所以我很好地想做到这一点?
DF1
address
0 8.47.124.1
1 63.215.97.2
2 8.47.124.2
3 63.215.97.1
4 8.47.124.1
5 8.47.124.2
6 8.47.124.1
7 8.47.124.1
DF2
ASN CIDR
0 1 [IPNetwork('8.47.124.0/29')]
1 1 [IPNetwork('8.45.244.0/29')]
2 2 [IPNetwork('63.215.97.8/29')]
3 1 [IPNetwork('8.13.232.64/27')]
4 2 [IPNetwork('63.215.97.16/29')]
5 2 [IPNetwork('63.215.97.24/29')]
6 1 [IPNetwork('8.13.228.128/27')]
7 1 [IPNetwork('8.13.228.96/27')]
所需的输出:
DF1
address asn
0 8.47.124.1 1
1 63.215.97.2 2
2 8.47.124.2 1
3 63.215.97.1 2
4 8.47.124.1 1
5 8.47.124.2 1
6 8.47.124.1 1
7 8.47.124.1 1
我越来越近:
import ipaddress
#Create new column "ASN" to DF1
DF1["ASN"] = ""
#While loop uses the library ipaddress that checks if address from "DF1" is in CIDR block of "DF2"
index = 0
while(index < lenth):
DF1["sourceaddress"].iloc[index] in DF2["CIDR"].iloc[index]
DF1["ASN"].iloc[index] = DF2["ASN"].iloc[index]
index = index + 1
但这只是给我ASN一路下跌的机会。我认为这只是给我DF2中头寸的ASN,而不是当DF1中的IP与DF2中的CIDR匹配时的ASN。
答案 0 :(得分:1)
考虑从第一个数字到最后一个期间构建IP地址的子字符串,然后合并在一起:
DF1['IP_Sub'] = DF1['address'].apply(lambda x: x[0:x.rindex('.')])
DF2['IP_Sub'] = DF2['CIDR'].apply(lambda x: x[12:x.rindex('.')])
DF2 = DF2[['IP_Sub', 'ASN']].drop_duplicates()
# MERGE DFs
DF3 = pd.merge(DF1, DF2, on='IP_Sub')[['address', 'ASN']]
print(DF3)
# address ASN
# 0 8.47.124.1 1
# 1 8.47.124.2 1
# 2 8.47.124.1 1
# 3 8.47.124.2 1
# 4 8.47.124.1 1
# 5 8.47.124.1 1
# 6 63.215.97.2 2
# 7 63.215.97.1 2
# MERGE DFs (MAINTAIN ORIGINAL INDEX)
DF3 = (DF1.reset_index()
.merge(DF2, on='IP_Sub', sort=False)
.filter(['index', 'address', 'ASN'])
.set_index('index').sort_index()
.rename_axis(None))
print(DF3)
# address ASN
# 0 8.47.124.1 1
# 1 63.215.97.2 2
# 2 8.47.124.2 1
# 3 63.215.97.1 2
# 4 8.47.124.1 1
# 5 8.47.124.2 1
# 6 8.47.124.1 1
# 7 8.47.124.1 1