我正在尝试使用请求库(或urllib.request)获取洋葱网站的html代码。我尝试了多种方法,但似乎都无法正常工作。
首先,我只是尝试使用请求库连接到代理并获取Facebook深度网络的HTML代码:
import requests
session = requests.session()
session.proxie = {}
session.proxies['http'] = 'socks5h://localhost:9050'
session.proxies['https'] = 'socks5h://localhost:9050'
r = requests.get('https://facebookcorewwwi.onion/')
print(r.text)
但是,当我这样做时,与代理的连接不起作用(无论有没有代理,我的IP都保持不变)。
我收到以下错误:
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='facebookcorewwwi.onion', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x109e8b198>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known',))
做完一些研究后,我发现有人尝试做类似的事情,而解决方案是在导入requests
/ urllib.request
库之前先连接到代理。
所以我尝试使用库socks
和socket
进行连接:
import socks
import socket
def create_connection(address, timeout=None, source_address=None):
sock = socks.socksocket()
sock.connect(address)
return sock
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, "127.0.0.1", 9050)
# patch the socket module
socket.socket = socks.socksocket
socket.create_connection = create_connection
import urllib.request
with urllib.request.urlopen('https://facebookcorewwwi.onion/') as response:
html = response.read()
print(html)
执行此操作时,代理的连接被拒绝:
urllib.error.URLError: <urlopen error Error connecting to SOCKS5 proxy 127.0.0.1:9050: [Errno 61] Connection refused>
我尝试使用requests
库,而不是像Follow一样(只需在显示import urllib.request
的行中替换它)
import requests
r = requests.get('https://facebookcorewwwi.onion/')
print(r.text)
但是在这里我得到这个错误:
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='facebookcorewwwi.onion', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x10d93ee80>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known',))
似乎无论我做什么,都无法与代理建立连接。有谁有替代解决方案或解决此问题的方法?