Question

我用python编写了两个脚本：一个使用selenium，另一个使用requests连接到 http://check.torproject.org 使用 Tor 并获得此文本 恭喜。此浏览器配置为从那里使用Tor ，以确保我以正确的方式进行操作。

当我使用以下脚本时，我会平滑地获取文本：

from selenium import webdriver
import os

torexe = os.popen(r"C:\Users\WCS\Desktop\Tor Browser\Browser\TorBrowser\Tor\tor.exe")

options = webdriver.ChromeOptions()
options.add_argument('--proxy-server=socks5://localhost:9050')
driver = webdriver.Chrome(chrome_options=options)

driver.get("http://check.torproject.org")
item = driver.find_element_by_css_selector("h1.not").text
print(item)

driver.quit()

但是，当我尝试使用requests执行相同操作时，出现错误AttributeError: 'NoneType' object has no attribute 'text'：

import requests
from bs4 import BeautifulSoup
import os

torexe = os.popen(r"C:\Users\WCS\Desktop\Tor Browser\Browser\TorBrowser\Tor\tor.exe")

with requests.Session() as s:
    s.proxies['http'] = 'socks5://localhost:9050'
    res = s.get("http://check.torproject.org")
    soup = BeautifulSoup(res.text,"lxml")
    item = soup.select_one("h1.not").text
    print(item)

如何使用requests从该站点获取相同的文本？

使用此print(soup.title.text)时，我可以得到此文本Sorry. You are not using Tor.，该文本清楚地表明requests不是通过Tor制作的。

Answer 1

check.torproject.org强制使用HTTPS，因此当请求遵循重定向到https://check.torproject.org时，您不再使用SOCKS代理，因为它仅是为http协议指定的。

确保同时为HTTP和HTTPS设置代理。另外，要通过Tor解析DNS名称而不泄漏DNS请求，请使用socks5h。

s.proxies['http']  = 'socks5h://localhost:9050'
s.proxies['https'] = 'socks5h://localhost:9050'

这应该使您的测试正常进行。

无法使用请求连接到Tor，但是我使用硒进行了相同的操作

1 个答案: