Question

我在Python上使用Selenium库来抓取一个写在js上的网站。我的策略是使用selenium在网站上移动，并在适当的时候使用BeautifulSoup进行搜索。这在简单测试中效果很好，除非如下图所示， I need to click on the "<" button.

＆＃34; class＆＃34;按钮在悬停时发生变化，因此我使用ActionChains移动到该元素并单击它（我也使用sleep来为浏览器提供足够的时间来加载页面）。 Python没有抛出任何异常，但点击不起作用（即日历不会向后移动）。

下面我提供上述网站和我用一个例子编写的代码。你知道为什么会发生这种情况和/或我如何克服这个问题？非常感谢你。

网站= https://burocomercial.profeco.gob.mx/index.jsp

代码：

from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
import time

driver = webdriver.Chrome(path_to_webdriver)
driver.get('https://burocomercial.profeco.gob.mx/index.jsp') #access website

# Search bar and search button
search_bar = driver.find_elements_by_xpath('//*[@id="txtbuscar"]')
search_button = driver.find_element_by_xpath('//*[@id="contenido"]/div[2]/div[2]/div[2]/div/div[2]/div/button')

# Perform search
search_bar[0].send_keys("inmobiliaria")
search_button.click()

# Select result
time.sleep(2)
xpath='//*[@id="resultados"]/div[4]/table/tbody/tr[1]/td[5]/button'
driver.find_elements_by_xpath(xpath)[0].click()

# Open calendar    
time.sleep(5)
driver.find_element_by_xpath('//*[@id="calI"]').click() #opens calendar
time.sleep(2)

# Hover-and-click on "<" (Here's the problem!!!)
cal_button=driver.find_element_by_xpath('//div[@id="ui-datepicker-div"]/div/a') 
time.sleep(4)
ActionChains(driver).move_to_element(cal_button).perform() #hover
prev_button = driver.find_element_by_class_name('ui-datepicker-prev') #catch element whose class was changed by the hover
ActionChains(driver).click(prev_button).perform() #click
time.sleep(1)
print('clicked on it a second ago. No exception was raised, but the click was not performed')
time.sleep(1)

Answer 1

这是一种使用请求的不同方法。我认为Selenium应该是进行webscrapping时使用的最后一个选项。通常，可以从模拟Web应用程序发出的请求的网页中检索数据

import requests
from bs4 import BeautifulSoup as BS
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36'}
## Starts session
s = requests.Session()
s.headers = headers
url_base = 'https://burocomercial.profeco.gob.mx/'
ind = 'index.jsp'
resp0 = s.get(url_base+ind) ## First request, to get the 'name' parameter that is dynamic
soup0 = BS(resp0.text, 'lxml')
param_name = soup0.select_one('input[id="txtbuscar"]')['name']
action = 'BusGeneral' ### The action when submit the form
keyword = 'inmobiliaria' # Word to search
data_buscar = {param_name:keyword,'yy':'2017'} ### Data submitted
resp1 = s.post(url_base+action,data=data_buscar) ## second request: make the search
resp2 = s.get(url_base+ind) # Third request: retrieve the results
print(resp2.text)
queja = 'Detalle_Queja.jsp' ## Action when Quejas selected
data_queja = {'Lookup':'2','Val':'1','Bus':'2','FI':'28-Nov-2016','FF':'28-Feb-2017','UA':'0'} # Data for queja form
## Lookup is the number of the row in the table, FI is the initial date and FF, the final date, UA is Unidad Administrativa
## You can change these parameters to obtain different queries.
resp3 = s.post(url_base+queja,data=data_queja) # retrieve Quejas results
print(resp3.text)

有了这个，我得到了：

'\r\n\r\n\r\n\r\n\r\n\r\n1|<h2>ABITARE PROMOTORA E INMOBILIARIA, SA DE CV</h2>|0|0|0|0.00|0.00|0|0.00|0.00|0.00|0.00|0 % |0 % ||2'

其中包含网页中使用的数据。也许这个答案并不完全符合您的要求，但我认为您可以更轻松地使用请求。

Answer 2

您无需将鼠标悬停在

library(RgoogleMaps)
library(ggmap)
library(ggsn)


Finalmap<- get_map(location = c(lon = -23.17, lat = 15.2), zoom = 11,  
                   maptype = "hybrid", scale = 2)

ggmap(Finalmap) +
  geom_point(data = Total_Surveys,
             aes(x = Longitude,
                 y = Latitude,
                 fill = Survey_Type,
                 alpha = 0.8), 
             size = 5,
             shape = 21) +
  coord_equal() + # needed for ggsn
  guides(alpha=FALSE, size=FALSE) + 
  ggsn::north(x.min = -23.3, x.max = -23.28, 
              y.min = 15.36, y.max = 15.41, scale = 1.5) + 
  ggsn::scalebar(x.min = -23.3, x.max = -23.25, 
                 y.min = 15.33, y.max = 15.35, 
                 dist = 5, dd2km = TRUE, 
                 model = "WGS84", height = 0.5, 
                 st.dist = 0.5
                 )

一些事情

如果您的XPath仅包含ID，请使用from selenium import webdriver from selenium.webdriver.support import expected_conditions as EC driver = webdriver.Chrome(path_to_webdriver) driver.get('https://burocomercial.profeco.gob.mx/index.jsp') #access website # set up wait wait = WebDriverWait(driver, 10) # Perform search driver.find_element_by_id('txtbuscar').send_keys("inmobiliaria") driver.find_element_by_css_selector('button[alt="buscar"]').click() # Select result xpath='//*[@id="resultados"]/div[4]/table/tbody/tr[1]/td[5]/button' wait.until(EC.element_to_be_clickable((By.XPATH, xpath))).click() # Open calendar wait.until(EC.element_to_be_clickable((By.ID, 'calI'))).click() #opens calendar wait.until(EC.element_to_be_visible((By.ID, 'ui-datepicker-div')) # Click on "<" wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'a[title="Ant"]'))).click()。它更快更容易阅读。
如果您只使用集合中的第一个元素，例如.find_element_by_id()，只需使用search_bar代替.find_element_*和.find_elements_*。
不要使用睡眠。睡眠是一种不好的做法，导致不可靠的测试。而是使用预期条件，例如等待元素可点击。

Python / Selenium＆＃34; hover-and-click＆＃34;不处理其类在悬停时更改的WebElement

2 个答案: