这就是我所能管理的一切!我正在尝试获取代理
import urllib.request
page = urllib.request.urlopen("http://www.samair.ru/proxy/ip-address-01.htm")
page('\d+\.\d+\.\d+\.\d+')
答案 0 :(得分:6)
在这种情况下,表格实际上不是HTML表格,而是包含在<pre></pre>
中的纯文本。您可以通过查看页面源来验证它。
无论如何,BeautifulSoup在公园散步:
In [1]: from bs4 import BeautifulSoup
In [2]: from urllib.request import urlopen
In [3]: bs = BeautifulSoup(urlopen('http://www.samair.ru/proxy/ip-address-01.htm'))
In [4]: print(bs.find('pre').text)
IP address Anonymity level Checked time Country
056.249.66.50:8080 transparent Apr-21, 10:33 Bulgaria
1.63.18.22:8080 transparent Apr-21, 05:56 China
1.9.75.8:8080 transparent Apr-21, 12:58 Malaysia
103.247.219.165:8080 transparent Apr-21, 04:01 Indonesia
103.4.165.190:80 transparent Apr-21, 11:34 Indonesia
103.9.126.110:8080 transparent Apr-21, 12:19 Indonesia
109.173.98.64:8080 transparent Apr-20, 22:39 Russian Federation
109.197.194.142:8080 transparent Apr-21, 12:07 Russian Federation
109.207.61.141:8090 transparent Apr-21, 11:14 Poland
109.207.61.145:8090 transparent Apr-21, 13:04 Poland
109.207.61.149:8090 transparent Apr-21, 10:21 Poland
109.207.61.165:8090 transparent Apr-21, 03:57 Poland
109.207.61.170:8090 transparent Apr-21, 11:02 Poland
109.207.61.208:8090 transparent Apr-21, 10:45 Poland
109.224.55.46:80 transparent Apr-20, 21:50 Iraq
109.227.124.105:8080 transparent Apr-21, 09:57 Ukraine
109.69.6.118:8080 transparent Apr-21, 11:44 Albania
110.138.248.135:8080 transparent Apr-21, 09:10 Indonesia
110.139.13.121:8080 transparent Apr-21, 11:31 Indonesia
110.159.179.108:80 transparent Apr-20, 20:35 Malaysia
In [5]: [l.split()[0] for l in bs.find('pre').text.split('\n')[1:]][1:]
Out[5]:
['056.249.66.50:8080',
'1.63.18.22:8080',
'1.9.75.8:8080',
'103.247.219.165:8080',
'103.4.165.190:80',
'103.9.126.110:8080',
'109.173.98.64:8080',
'109.197.194.142:8080',
'109.207.61.141:8090',
'109.207.61.145:8090',
'109.207.61.149:8090',
'109.207.61.165:8090',
'109.207.61.170:8090',
'109.207.61.208:8090',
'109.224.55.46:80',
'109.227.124.105:8080',
'109.69.6.118:8080',
'110.138.248.135:8080',
'110.139.13.121:8080',
'110.159.179.108:80']