Question

我有一个简单的python代码，可以从开源api中提取一些恶意软件供稿，并从此列表中找到唯一的IP。

该URL已经包含IP，但是当您捕获它并将其保存在本地文件中时，您会看到每个IP后还存在其他字符串\ r \ n，这可能是因为换行了。我是Python的新手，我在这里做错了吗？

import urllib.request
import urllib.parse
import re


url = 'http://www.malwaredomainlist.com/hostslist/ip.txt'
resp = urllib.request.urlopen(url)
ip = re.findall(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', resp)
malwareIPList = ip.read()
print (malwareIPlist)

错误在findall的223行返回_compile（pattern，flags）.findall（string） TypeError：预期的字符串或类似字节的对象

Answer 1

问题是您需要.read()中的resp urllib.request.urlopen

考虑：

import urllib.request
import urllib.parse
import re


url = 'http://www.malwaredomainlist.com/hostslist/ip.txt'
resp = urllib.request.urlopen(url)
print(resp)

打印：

<http.client.HTTPResponse object at 0x103a4ccf8>

我认为您正在寻找的是：

url = 'http://www.malwaredomainlist.com/hostslist/ip.txt'
resp = urllib.request.urlopen(url)
ip = re.findall(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', str(resp.read(), 'utf-8'))

print (ip)

打印一堆IP地址...

顺便说一句，由于数据是用\r\n分隔的ip地址，因此您实际上不需要正则表达式。您可以这样做：

>>> str(resp.read(), 'utf-8').splitlines()
['103.14.120.121', '103.19.89.55', '103.224.212.222', '103.24.13.91', ...]

python regex API拉并转换为文本格式给错误

1 个答案: