Question

我正在尝试通过一些Web爬网（beautifulSoup）在线访问数据。但是，我似乎无法正确设置代理。

import requests
import numpy as np
import pandas as pd
from bs4 import BeautifulSoup as soup
from urllib import request as urlrequest
from urllib.request import urlopen as uReq

proxies = {'http': 'webproxy.tentrum.com','https': 'http:webproxy.tentrum.co.uk:8080'}

#OpenURL
url = requests.get('https://www.investing.com/rates-bonds/australia-1-year-bond-yield-historical-data',proxies=proxies, headers={'User-Agent': 'Mozilla/5.0'})

data = np.array([])


#DETERMINE FORMAT
content_page = soup(url.content,'html.parser')

containers = content_page.findAll('table', {'class':'genTbl closedTbl historicalTbl'})
for table in containers:
    for td in table.findAll('td'):
        #print(td.text)
        data = np.append(data, td.text)

data

我收到以下错误消息。我的Internet代理是webproxy.tentrum.com，端口是8080。我定义错了吗？

ProxyError: HTTPSConnectionPool(host='www.investing.com', port=443): Max retries exceeded with url: /rates-bonds/australia-1-year-bond-yield-historical-data (Caused by ProxyError('Cannot connect to proxy.', NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x000001E6ECACDE48>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it',)))

蟒蛇访问互联网的网络代理

0 个答案: