我刚刚开始使用python 2.7并尝试执行一个相当基本的数据操作任务。
目标:从网站中提取列表并逐行操作数据。
然而,我正处于第二关;将数据放入一个稍后可以操作的数组中。
我尝试了几种方法,而不是每行处理每个数组条目,而是每个字符处理一个数组条目。这是我到目前为止尝试的方法:
import urllib2, numpy
from array import array
listraw = urllib2.urlopen("https://zeustracker.abuse.ch/blocklist.php?download=badips").read()
list = [line.rstrip('\n\r') for line in listraw]
array = []
for line in listraw:
array.append(line)
numpyarray = numpy.asarray(listraw)
lines = tuple(listraw)
#print r'Listraw:'
#print listraw
print 'Single List item: '
print list [10]
print array[10]
print lines[10]
print numpyarray[10]
输出:
Single List item:
#
#
#
Traceback (most recent call last):
File "./test.py", line 17, in <module>
print numpyarray[10]
IndexError: too many indices for array
我在list变量中尝试了\n
和\r
solo但没有成功。如果我取消注释print listraw
,它会在正确的位置显示带有回车符的整个列表。
我知道我遗漏了一些基本的东西,因为array2= ["bob","bert","geof"]
有效,但我迄今发现的所有东西都没有解决我的问题。实现目标的最佳方式是什么?
答案 0 :(得分:2)
循环使用数据,您似乎感到困惑strip
和split
。尝试:
data_list = listraw.split('\n')
另外,如果您注意到,我将list
重命名为data_list
,因为list
是内置的python,并为其分配某些内容会覆盖它,这可能会导致意外和难以追踪未来的错误。
另外,基于@Lukas&#39;评论,您可以将代码重构为:
listraw = urllib2.urlopen("https://zeustracker.abuse.ch/blocklist.php?download=badips")
array = []
for line in listraw:
array.append(line.strip())
numpyarray = numpy.asarray(listraw)
答案 1 :(得分:1)
您可以使用numpy.loadtxt
,它会处理空行和以#
开头的行:
import urllib2, numpy
listraw = urllib2.urlopen("https://zeustracker.abuse.ch/blocklist.php?download=badips")
print(numpy.loadtxt(listraw, dtype=str))
哪个会给你:
['101.0.89.3' '101.200.81.187' '103.19.89.118' '103.230.84.239'
'103.241.0.100' '103.26.128.84' '103.4.52.150' '103.7.59.135'
'107.179.62.12' '108.174.157.123' '109.127.8.242' '109.229.210.250'
'109.229.36.65' '113.29.230.24' '116.193.77.118' '120.25.63.2'
'120.31.134.133' '120.63.157.195' '123.30.129.179' '124.110.195.160'
'128.210.157.251' '141.105.71.73' '151.97.190.239' '157.7.170.62'
'158.69.114.173' '160.97.52.229' '162.223.94.56' '175.107.192.78'
'177.4.23.159' '180.182.234.200' '185.24.234.108' '185.25.117.49'
'185.25.119.84' '185.25.49.241' '185.80.129.62' '186.64.120.104'
'187.174.252.247' '188.219.154.228' '188.226.141.142' '188.241.140.212'
'188.241.140.222' '188.241.140.224' '188.247.135.53' '188.247.135.58'
'188.247.135.74' '188.247.135.99' '190.123.35.140' '190.123.35.141'
'190.128.29.1' '190.15.192.25' '192.64.11.244' '192.99.148.26'
'192.99.19.4' '193.0.200.185' '193.107.17.145' '193.107.17.55'
'193.107.17.56' '193.107.19.24' '193.107.19.244' '193.146.210.69'
'193.189.117.56' '193.201.227.142' '194.15.112.29' '194.15.112.30'
'194.58.103.199' '195.20.40.123' '195.20.41.233' '195.20.42.1'
'195.20.43.189' '195.20.44.100' '195.20.44.109' '195.20.46.116'
'195.20.47.56' '195.242.161.117' '198.245.202.92' '199.115.228.68'
'199.187.129.193' '199.201.121.185' '199.7.234.100' '201.149.83.183'
'202.144.144.195' '202.29.22.38' '202.29.230.198' '202.67.13.107'
'203.170.193.23' '209.164.84.70' '210.211.108.215' '210.4.76.221'
'212.44.64.202' '213.147.67.20' '216.176.100.240' '216.176.184.21'
'216.215.112.149' '222.29.197.232' '31.28.27.17' '31.7.63.146'
'37.123.99.188' '37.143.11.189' '37.46.134.60' '37.48.108.162'
'37.48.125.119' '46.151.52.191' '46.151.52.61' '46.151.54.46'
'46.4.150.111' '58.195.1.4' '59.157.4.2' '60.13.186.5' '60.241.184.209'
'63.249.152.74' '64.127.71.73' '64.182.215.68' '64.182.6.61' '64.85.233.8'
'64.90.187.131' '78.138.104.167' '78.46.222.241' '80.65.93.241'
'83.15.254.242' '83.212.117.233' '83.222.14.207' '83.69.233.121'
'87.236.210.110' '87.236.210.124' '87.237.198.245' '87.246.143.242'
'87.254.167.37' '89.40.181.101' '91.108.176.118' '91.236.213.74'
'91.236.75.11' '92.53.119.248' '92.53.124.62' '93.171.205.12'
'93.171.205.5' '94.103.36.55' '95.211.153.134' '98.131.185.136']
追加每一行和剥离将包括开头的行:
#############################################################################################
# abuse.ch ZeuS IP blocklist "BadIPs" (excluding hijacked sites and free hosting providers) #
# #
# For questions please refer to https://zeustracker.abuse.ch/blocklist.php #
#############################################################################################
我想你不想要。
答案 2 :(得分:0)
不可否认,检查IP是相当粗略的:如果一行以数字开头,则假定它是一个IP地址;)
import requests
url = 'https://zeustracker.abuse.ch/blocklist.php'
payload = {'download' : 'badips'}
res = requests.get(url, payload)
# ignore all empty lines and those that don't start with a number
bad_ips = [item for item in res.text.split('\n') if item and item[0].isdigit()]