希望向你们寻求帮助!我想从使用Python的论坛中删除用户名,但我无法弄清楚该方法。以下是代码的一部分:
第1部分
<td class="alt2" title="reply: 11,view: 1,097">
<div class="smallfont" style="text-align:right; white-space:nowrap">
2017-03-28 <span class="time">23:44</span><br>
<a href="member.php?find=lastposter&t=1907777" rel="nofollow">username</a> <a href="showthread.php?p=9575713#post9575713"><img class="inlineimg" src="http://s.bbkz.net/forum/images/buttons_style/tc_2/lastpost.gif" alt="last" title="last" border="0"></a>
</div>
</td>
第2部分
<div class="smallfont">
<span style="cursor:pointer" onclick="window.open('member.php?u=353562', '_self')">username</span>
</div>
此外,论坛链接的格式为:--
我想废弃用户名&#39;使用Python在不同页面上的这些代码,我可以帮到你吗?
非常感谢!!
[编辑 - 添加时间睡眠] 应该是这样的吗?
import requests
from bs4 import BeautifulSoup
import time
url = 'http://www.example.com/forum/forumdisplay.php?f=148&order=desc&page=3'
html_source = requests.get(url).text
soup = BeautifulSoup(html_source, 'html.parser')
a_tags = soup.find_all('a')
for a in a_tags:
if 'member.php?' in a['href']:
print(a.text)
time.sleep(10)
以下是错误消息:
Traceback (most recent call last):
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests\packages\urllib3\connection.py", line 138, in _new_conn
(self.host, self.port), self.timeout, **extra_kw)
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests\packages\urllib3\util\connection.py", line 98, in create_connection
raise err
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests\packages\urllib3\util\connection.py", line 88, in create_connection
sock.connect(sa)
TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 594, in urlopen
chunked=chunked)
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 361, in _make_request
conn.request(method, url, **httplib_request_kw)
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1106, in request
self._send_request(method, url, body, headers)
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1151, in _send_request
self.endheaders(body)
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1102, in endheaders
self._send_output(message_body)
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 934, in _send_output
self.send(msg)
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 877, in send
self.connect()
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests\packages\urllib3\connection.py", line 163, in connect
conn = self._new_conn()
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests\packages\urllib3\connection.py", line 147, in _new_conn
self, "Failed to establish a new connection: %s" % e)
requests.packages.urllib3.exceptions.NewConnectionError: <requests.packages.urllib3.connection.HTTPConnection object at 0x029131F0>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests\adapters.py", line 423, in send
timeout=timeout
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 643, in urlopen
_stacktrace=sys.exc_info()[2])
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests\packages\urllib3\util\retry.py", line 363, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
requests.packages.urllib3.exceptions.MaxRetryError:
HTTPConnectionPool(host='www.example.com', port=80): Max retries exceeded with url: /forum/forumdisplay.php?f=148&order=desc&page=3 (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x029131F0>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/user/PycharmProjects/untitled/backpackertw_v1.py", line 6, in <module>
html_source = requests.get(url).text
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests\api.py", line 70, in get
return request('get', url, params=params, **kwargs)
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests\api.py", line 56, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests\sessions.py", line 488, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests\sessions.py", line 609, in send
r = adapter.send(request, **kwargs)
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests\adapters.py", line 487, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError:
HTTPConnectionPool(host='www.example.com', port=80): Max retries exceeded with url: /forum/forumdisplay.php?f=148&order=desc&page=3 (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x029131F0>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond',))
答案 0 :(得分:0)
您的代码将是这样的:
import requests
from bs4 import BeautifulSoup
url = 'http://www.example.com/forum/forumdisplay.php?f=148&order=desc&page=3'
html_source = requests.get(url).text
soup = BeautifulSoup(html_source, 'html.parser')
a_tags = soup.find_all('a')
for a in a_tags:
if 'member.php?' in a['href']:
print(a.text)
然后,您将不得不使用循环将其实现到更多页面以创建每个URL:
即:
for i in range(10)
url = 'http://www.example.com/forum/forumdisplay.php?f=148&order=desc&page={}'.format(i)
###
#insert the rest of your code here
###