我正在尝试通过一些Web爬网(beautifulSoup)在线访问数据。但是,我似乎无法正确设置代理。
import requests
import numpy as np
import pandas as pd
from bs4 import BeautifulSoup as soup
from urllib import request as urlrequest
from urllib.request import urlopen as uReq
proxies = {'http': 'webproxy.tentrum.com','https': 'http:webproxy.tentrum.co.uk:8080'}
#OpenURL
url = requests.get('https://www.investing.com/rates-bonds/australia-1-year-bond-yield-historical-data',proxies=proxies, headers={'User-Agent': 'Mozilla/5.0'})
data = np.array([])
#DETERMINE FORMAT
content_page = soup(url.content,'html.parser')
containers = content_page.findAll('table', {'class':'genTbl closedTbl historicalTbl'})
for table in containers:
for td in table.findAll('td'):
#print(td.text)
data = np.append(data, td.text)
data
我收到以下错误消息。我的Internet代理是webproxy.tentrum.com,端口是8080。我定义错了吗?
ProxyError: HTTPSConnectionPool(host='www.investing.com', port=443): Max retries exceeded with url: /rates-bonds/australia-1-year-bond-yield-historical-data (Caused by ProxyError('Cannot connect to proxy.', NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x000001E6ECACDE48>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it',)))