我正在尝试使用Selenium生成网址列表。 我希望用户浏览已检测的浏览器,最后创建一个他访问过的URL列表。
我发现属性“current_url”有助于做到这一点,但我没有找到方法知道用户点击了链接。
In [117]: from selenium import webdriver
In [118]: browser = webdriver.Chrome()
In [119]: browser.get("http://stackoverflow.com")
--> here, I click on the "Questions" link.
In [120]: browser.current_url
Out[120]: 'http://stackoverflow.com/questions'
--> here, I click on the "Jobs" link.
In [121]: browser.current_url
Out[121]: 'http://stackoverflow.com/jobs?med=site-ui&ref=jobs-tab'
任何暗示都赞赏!
谢谢,
答案 0 :(得分:2)
目前还没有一种官方的方法可以监控用户在Selenium中的行为。你唯一能做的就是启动驱动程序,然后运行一个不断检查driver.current_url
的循环。但是,我不知道退出此循环的最佳方法是什么,因为我不知道您的用法是什么。也许尝试类似的事情:
from selenium import webdriver
urls = []
driver = webdriver.Firefox()
current = 'http://www.google.com'
driver.get('http://www.google.com')
while True:
if driver.current_url != current:
current = driver.current_url
# if you want to capture every URL, including duplicates:
urls.append(current)
# or if you only want to capture unique URLs:
if current not in urls:
urls.append(current)
如果您对如何结束此循环一无所知,我建议用户导航到会破坏循环的网址,例如http://www.endseleniumcheck.com
并将其添加到代码中:
from selenium import webdriver
urls = []
driver = webdriver.Firefox()
current = 'http://www.google.com'
driver.get('http://www.google.com')
while True:
if driver.current_url == 'http://www.endseleniumcheck.com':
break
if driver.current_url != current:
current = driver.current_url
# if you want to capture every URL, including duplicates:
urls.append(current)
# or if you only want to capture unique URLs:
if current not in urls:
urls.append(current)
或者,如果你想变得狡猾,你可以在用户退出浏览器时终止循环。您可以通过使用psutil
库(pip install psutil
)监控进程ID来执行此操作:
from selenium import webdriver
import psutil
urls = []
driver = webdriver.Firefox()
pid = driver.binary.process.pid
current = 'http://www.google.com'
driver.get('http://www.google.com')
while True:
if pid not in psutil.pids():
break
if driver.current_url != current:
current = driver.current_url
# if you want to capture every URL, including duplicates:
urls.append(current)
# or if you only want to capture unique URLs:
if current not in urls:
urls.append(current)