网址请求超出了最大重试次数

时间:2019-04-28 02:19:28

标签: python python-requests

我正在尝试对此页面进行网页剪贴,我使用的代码是这样:

page = get("https://www.uobgroup.com/online-rates/gold-and-silver-prices.page")

运行此代码时出现此错误:

Traceback (most recent call last):
  File "/Users/lakesh/WebScraping/Gold.py", line 46, in <module>
    page = get("https://www.uobgroup.com/online-rates/gold-and-silver-prices.page")
  File "/Library/Python/2.7/site-packages/requests/api.py", line 72, in get
    return request('get', url, params=params, **kwargs)
  File "/Library/Python/2.7/site-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/Library/Python/2.7/site-packages/requests/sessions.py", line 512, in request
    resp = self.send(prep, **send_kwargs)
  File "/Library/Python/2.7/site-packages/requests/sessions.py", line 622, in send
    r = adapter.send(request, **kwargs)
  File "/Library/Python/2.7/site-packages/requests/adapters.py", line 511, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='www.uobgroup.com', port=443): Max retries exceeded with url: /online-rates/gold-and-silver-prices.page (Caused by SSLError(SSLError(1, u'[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:590)'),))

也尝试过:

page = get("https://www.uobgroup.com/online-rates/gold-and-silver-prices.page",verify=False)

这也不起作用。需要一些指导。

完整代码:

from requests import get
import requests
from requests.exceptions import RequestException
from contextlib import closing
from bs4 import BeautifulSoup
from collections import defaultdict
import json

requests.packages.urllib3.util.ssl_.DEFAULT_CIPHERS = 'DES-CBC3-SHA'
page = get("https://www.uobgroup.com/online-rates/gold-and-silver-prices.page")
html = BeautifulSoup(page.content, 'html.parser')
result = defaultdict(list)
last_table = html.find_all('table')[-1]

2 个答案:

答案 0 :(得分:0)

我添加了verify=False选项,还取出了设置密码的行。完成此操作后,有时您的代码会在Python 3中为我工作。它只能工作一次,然后似乎暂时不能工作。我的猜测是,该网站正在限制访问速度,可能是基于它看到的代理签名,试图限制漫游器访问。我在工作时打印了last_table,这是我得到的:

<table class="responsive-table-rates table table-striped table-bordered" id="nova-funds-list-table">
<tbody>
<tr>
<td style="background-color: #002265; text-align: center; color: #ffffff;">DESCRIPTION</td>
<td style="background-color: #002265; text-align: center; color: #ffffff;">CURRENCY</td>
<td style="background-color: #002265; text-align: center; color: #ffffff;">UNIT</td>
<td style="background-color: #002265; text-align: center; color: #ffffff;">BANK SELLS</td>
<td style="background-color: #002265; text-align: center; color: #ffffff;">BANK BUYS</td>
<td style="text-align: left; display: none;"> </td>
<td style="text-align: left; display: none;"> </td>
</tr>
</tbody>
</table>

我将传入的内容转储到文件中。当它起作用时,我得到可读的HTML。当它不起作用时,我会在顶部获得一些可读的行,然后是一堆乱码,可能是一些复杂的Javascript。不知道那是什么。当它不起作用时,我得到了:

  

回溯(最近通话最近):文件   “ /Users/stevenjohnson/lab/so/ReadAFile.py”,第8行,在       last_table = html.find_all('table')[-1] IndexError:列表索引超出范围

无论哪种情况,我都会得到200状态代码。

这是我的代码版本:

from requests import get
from bs4 import BeautifulSoup
from collections import defaultdict

page = get("https://www.uobgroup.com/online-rates/gold-and-silver-prices.page", verify=False)
html = BeautifulSoup(page.content, 'html.parser')
result = defaultdict(list)
last_table = html.find_all('table')[-1]
print(last_table)

我在Mac上。也许您不是,而且您的计算机上的证书链与我的计算机上的证书链不同,因此您无法尽我所能。但是,我想让您知道,您的代码仅对verify=False有用。

答案 1 :(得分:0)

verify 设置为 False。请注意,这不会检查证书的有效性。在其他程序中,它可能会将您的计算机暴露给黑客!

使用以下代码重试:

page = get("https://www.uobgroup.com/online-rates/gold-and-silver-prices.page", verify=False)