Question

我正在使用selenium自动从网站下载：

http://www.diva-gis.org/datadown

我在第一页中单击“确定”没有问题，但我无法在第二页中单击“下载”。错误消息是“没有这样的元素”

下面是我的代码：

from selenium import webdriver
import os

driver=webdriver.Chrome(os.path.expanduser('./chromedriver'))
driver.get('http://www.diva-gis.org/gdata')

driver.find_element_by_xpath('//*[@id="node-36"]/div/div/div/div/form/p[1]/select/option[190]').click()
driver.find_element_by_xpath('//*[@id="node-36"]/div/div/div/div/form/p[3]/input').click()


# this is the one has problem
driver.find_element_by_xpath('//*[@id="node-39"]/div/div/div/div/a/h2').click()

我尝试了find_element_by_xpath，find_element_by_class_name ......它们都不起作用。任何熟悉Selenium的人都可以帮助我解决这个问题吗？

Answer 1

点击＆＃34后，您正试图找到一两毫秒的按钮;确定＆＃34;按钮，没有给页面提供渲染的机会。您需要等待页面刷新，然后才能搜索“下载”按钮。

例如：

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

...

try:
    element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.LINK_TEXT, "Download")))
    element.click()
except:
    print("unable to find the Download button after 10 seconds")

这是一个完整的工作示例：

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get("http://www.diva-gis.org/datadown")
form = driver.find_element_by_tag_name("form")
form.submit()

try:
    element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.LINK_TEXT, "Download")))
    element.click()
except:
    print("unable to find the Download button after 10 seconds")

driver.close()

Answer 2

试试这个。我既没有使用硒也没有使用正则表达式完成任务。执行后，您将根据所选文件夹下载所需的zip文件。

library(tidyverse)

iris %>%
  mutate_at(.funs = scale, .vars = vars(-c(Species))) %>%
  rowwise() %>% 
  mutate(my_mean=mean(c(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)))

Answer 3

你得到了什么错误？检查是否有任何框架。如果有任何帧，那么你需要先切换到帧

Answer 4

更新

链接是否不符合HTML，但是selenium设法找到链接本身。

所以就像上面的回答一样，你要做的就是等几秒钟。

另一种解决方案：仅使用请求

这不是selenium，但此解决方案可以使您的抓取工具比<a href=http://biogeo.ucdavis.edu/data/diva/adm/KOR_adm.zip> <h2>Download</h2></a>快得多。

网站的问题是链接（下载链接）不符合HTML。

因为它的链接是这样的：

<a href="http://biogeo.ucdavis.edu/data/diva/adm/KOR_adm.zip">
<h2>Download</h2></a>

应该是这样的：

re

因此selenium无法正确找到标签元素。也不是BeautifulSoup。

所以我使用import re import requests cookies = { 'has_js': '1', } headers = { 'Origin': 'http://www.diva-gis.org', 'Accept-Encoding': 'gzip, deflate', 'Accept-Language': 'ko-KR,ko;q=0.8,en-US;q=0.6,en;q=0.4,la;q=0.2,da;q=0.2', 'Upgrade-Insecure-Requests': '1', 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36', 'Content-Type': 'application/x-www-form-urlencoded', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8', 'Cache-Control': 'max-age=0', 'Referer': 'http://www.diva-gis.org/gdata', 'Connection': 'keep-alive', 'DNT': '1', } data = [ ('cnt', 'AFG_Afghanistan'), ('thm', 'adm#Administrative areas (GADM)'), ('OK', 'OK'), ('_submit_check', '1'), ] r = requests.post('http://www.diva-gis.org/datadown', headers=headers, cookies=cookies, data=data) pattern = re.compile('href=.+zip') zip_link = pattern.findall(r.text)[0].replace("href=", '').replace("'", '') print(zip_link) filename = re.findall("[^/]*$", zip_link)[0] response = requests.get(zip_link, stream=True) f = open(filename, "wb+") for chunk in response.iter_content(chunk_size=512): if chunk: f.write(chunk)来寻找链接......

尝试使用此代码：

{{1}}

此代码不使用selenium，但假装像Chrome，POST请求，以便您可以获取信息。

您需要做的是更改数据变量。

Selenium-如何在http://www.diva-gis.org/datadown

4 个答案:

更新

另一种解决方案：仅使用请求