Question

我知道有很多线程在讨论这个问题，但我已经尝试了许多解决方案，但似乎没有任何效果。我会非常具体，所以你们可以请帮助我！

我试图在Windows 10上使用Python 3中的Selenium对网站进行网页抓取。这个网站在一定数量的请求后阻止我，所以我的红色是如果我使用Tor作为Selenium网络驱动程序我可以只需向Tor询问每个特定数量的请求的新身份（这意味着不同的IP）。

以下代码让我在Tor Browser文件夹中使用Tor firefox配置文件进行抓取。这些代码唯一缺少的是我无法申请新身份（新IP）。

profiler = webdriver.FirefoxProfile(r"C:\Users\Samir\Desktop\Tor 
Browser\Browser\TorBrowser\Data\Browser\profile.default")

profiler.set_preference("network.proxy.type", 1)
profiler.set_preference("network.proxy.socks",'127.0.0.1')
profiler.set_preference("network.proxy.socks_port",9050)

driver = webdriver.Firefox(firefox_profile=profiler)

driver.implicitly_wait(15)
driver.get("The URL I want to scrape")

#Extract whatever information i want from the URL

我尝试使用Stem库来获取新标识，但这似乎不适用于前面代码的Tor firefox配置文件。但是，如果我只是打开浏览器双击Tor浏览器快捷方式图标，那么这个工作正常是在我安装Tor时创建的。

#This is the code in stem that gets new identity using Stem. As I said, 
#this does not work with the selenium firefox profile for Tor. 

from stem import Signal
from stem.control import Controller

with Controller.from_port(port = 9051) as controller:
controller.authenticate()
controller.signal(Signal.NEWNYM)

好的，总结一下，有没有办法用我之前展示的代码获取新的IP？或者我可以做些什么来实现我想要的东西使用python 3，selenium和tor在Windows 10以及任何其他库或任何必要的。

如果您有任何疑问或需要更多信息，请告诉我。

非常多！!!

在Windows 10上使用Selenium和Tor以及python 3来抓取一个网站

0 个答案: