Question

我有一个名为all_urls.txt的网址文本列表。文本文件中的每个URL都在一行上。我想将此列表传递给selenium（python）以提取特定数据。我可以通过逐个使用url来实现这一点，但这并不高效。我目前的代码如下： -

profile = FirefoxProfile('/home/test/.mozilla/firefox/mfgrtrtr.Default3')
browser = webdriver.Firefox(firefox_profile=profile)
browser.maximize_window()
# get website
browser.get('https://www.some-website.com/')
# get current url
print browser.current_url
# get name & get phone number
name = browser.find_element_by_class_name("name")
print name.text
phone = browser.find_element_by_class_name("phone")
print phone.text

如何将列表传递给browser.get并从每个网址中提取名称和电话。在此先感谢您的帮助，我是蟒蛇新手但享受挑战。

Answer 1

您可能需要一个07:20 10/03/2016 25.4 24.2 24.7 07:30 10/03/2016 25.2 23.9 24.3 07:40 10/03/2016 25.1 23.8 24.3 07:50 10/03/2016 25.1 23.8 24.3 08:00 10/03/2016 25.1 23.8 24.3 08:10 10/03/2016 25.1 23.9 24.3 08:20 10/03/2016 24.9 24.2 24.3 08:30 10/03/2016 24.9 24.2 24.3 08:40 10/03/2016 24.9 24.2 24.3 08:50 10/03/2016 25 24.5 24.6 09:00 10/03/2016 25.1 24.6 24.7 09:10 10/03/2016 25.2 24.6 24.8 09:20 10/03/2016 25.2 24.6 24.8 09:30 10/03/2016 25.2 24.6 24.7 09:40 10/03/2016 25.2 24.6 24.7循环，它可以遍历列表。您的代码应如下所示：

for

对URL的profile = FirefoxProfile('/home/test/.mozilla/firefox/mfgrtrtr.Default3') browser = webdriver.Firefox(firefox_profile=profile) browser.maximize_window() with open("your_file_name") as in_file: for url in in_file: # get website browser.get(url.strip()) # get current url print browser.current_url # get name & get phone number name = browser.find_element_by_class_name("name") print name.text phone = browser.find_element_by_class_name("phone") print phone.text方法调用只是确保它没有前导或尾随空格 - 从文件读入的行通常包括尾随换行符。

Answer 2

打开文件：

my_file = open("all_urls.txt", "r")

迭代它并在每个网址上使用get函数：

for url in my_file:
    browser.get(url)
    print ...
    print ...

Python Selenium使用Urls列表

2 个答案: