我正试图从财富最好的100家公司中提取一些信息,以便为链接工作。
我实际上正在浏览每家公司并提取信息。以下是代码:
import datetime
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
from urllib.request import urlopen
from selenium import webdriver
import time
init_url='http://fortune.com/best-companies/google-alphabet-1/'
i=1
while i<=4:
page=urlopen(init_url)
soup=BeautifulSoup(page,'html.parser')
first_table=soup.find('table',{"class":"company-data-table"})
th1=first_table.find('th',text='Industry')
td1=th1.findNext('td')
print(td1.text)
th2=first_table.find('th',text='Type of organization')
td2=th2.findNext('td')
print(td2.text)
driver=webdriver.Firefox()
driver.get(init_url)
time.sleep(5)
elem1=driver.find_element_by_link_text("Next Company")
elem1.click()
init_url=driver.current_url
driver.quit()
i+=1
但是,这段代码不断给我这个错误:
Traceback (most recent call last):
File "C:/Users/pc/Desktop/panda_try.py", line 28, in <module>
elem1.click()
File "C:\Users\pc\AppData\Local\Programs\Python\Python35-32\lib\site-packages\selenium\webdriver\remote\webelement.py", line 77, in click
self._execute(Command.CLICK_ELEMENT)
File "C:\Users\pc\AppData\Local\Programs\Python\Python35-32\lib\site-packages\selenium\webdriver\remote\webelement.py", line 494, in _execute
return self._parent.execute(command, params)
File "C:\Users\pc\AppData\Local\Programs\Python\Python35-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 236, in execute
self.error_handler.check_response(response)
File "C:\Users\pc\AppData\Local\Programs\Python\Python35-32\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 192, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.ElementNotVisibleException: Message: Element is not visible
我应该如何纠正这个问题?我在这方面与时间竞争,任何帮助将不胜感激。谢谢!
答案 0 :(得分:1)
有多个元素匹配&#34;链接文本&#34;定位器。您应该过滤可见链接,然后单击它:
for link in driver.find_elements_by_link_text("Next Company"):
if link.is_displayed():
link.click()
break
或者,另一种可能有用的方式,并且通过扩展名替换不可靠的time.sleep()
是Explicit Wait和element_to_be_clickable
预期条件:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver.get(init_url)
wait = WebDriverWait(driver, 10)
link = wait.until(EC.element_to_be_clickable((By.LINK_TEXT, "Next Company")))
link.click()
答案 1 :(得分:0)
在这种情况下我会使用的是XPath选择器 WebDriverWait表示必须单击的元素。我也进行了一些更改,例如加载浏览器一次,这样可以更快地运行任务。在我的情况下,selenium无法与最新的Firefox一起运行,因此我使用了较旧的selenium版本(2.49)和Firefox 33,它在加载Web驱动程序时使用FirefoxBinary设置。
from bs4 import BeautifulSoup
from urllib2 import urlopen
from selenium import webdriver
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
init_url = 'http://fortune.com/best-companies/google-alphabet-1/'
next_company_xpath = "//article[contains(@class, 'current')]//div[contains(@class, 'pagination')]//a[contains(.,'Next Company')]"
# Load the webdriver
driver = webdriver.Firefox(firefox_binary=FirefoxBinary('firefox/firefox'))
driver.set_window_size(1980, 1080)
driver.get(init_url)
i = 1
while i <= 4:
page = urlopen(init_url)
soup = BeautifulSoup(page, 'html.parser')
first_table = soup.find('table', {"class": "company-data-table"})
th1 = first_table.find('th', text='Industry')
td1 = th1.findNext('td')
print(td1.text)
th2 = first_table.find('th', text='Type of organization')
td2 = th2.findNext('td')
print(td2.text)
wait = WebDriverWait(driver, 10)
link = wait.until(EC.element_to_be_clickable((By.XPATH, next_company_xpath)))
link.click()
init_url = driver.current_url
i += 1
driver.close()