我有一个Pandas数据帧(df),其列有'ip','country','city'。该数据帧df具有许多行(~50,000),其与不同的IP相同。我想为每个IP执行'whois'查找并获取更多属性,并将这些属性添加到现有数据框df中。
例如,从whois的结果中,我检索'asn','asn_cidr','email','address'和'adminName'。
目前,这就是我正在做的事情(示例代码):
expected concrete lifetime, found bound lifetime parameter
df ['ip']看起来像这样:
def f(ip):
obj = IPWhois(ip, timeout = 10)
if obj:
results = obj.lookup_rdap(depth=1, retry_count=5, rate_limit_timeout=60)
else:
results = None
return results
def ac(ip):
obj = IPWhois(ip, timeout = 10)
if obj:
names = []
results = obj.lookup_rdap(depth=1, retry_count=5, rate_limit_timeout=60)
entities = results['entities']
for e in entities:
name = results['objects'][e]['contact']['name']
names.append(name)
return names
from ipwhois import IPWhois
df['adminName'] = df['ip'].map(lambda x: ac(x))
df['email'] = df['ip'].map(lambda x: email(x))
df['address'] = df['ip'].map(lambda x: address(x))
df['asn'] = df['ip'].map(lambda x: f(x)['asn'])
df['asn_cidr'] = df['ip'].map(lambda x: f(x)['asn_cidr'])
问题:上面的代码有效。但是,它会对同一个IP进行大量冗余查找。例如:要填充adminName列,它将获取所有50000个IP,获取其JSON对象,检索管理员名称,填写然后转到下一个属性。同样,对于每50000 Ips,同样...... 我想为每个IP执行一次查找,获取生成的JSON对象,解析json文件以获取所有必需的属性,在新数据框(或现有数据框)中填充一行,然后转到下一个IP地址。类似的东西:[不确定这种方法是否有效/好......]
SS-MacBook-Pro:src SS$ python exRepList_0.21.py
0 46.4.123.15
1 222.136.71.19
2 27.254.67.157
3 37.48.125.51
4 153.194.72.226
5 116.117.253.243
6 91.200.12.111
7 60.173.82.156
8 60.28.1.43
9 36.72.228.72
Name: ip, dtype: object
SSs-MacBook-Pro:src SS$