Question

供参考：http://wogcc.state.wy.us/urecordsMenu.cfm?Skip=%27Y%27&oops=ID14447

我正在尝试从没有专用网址的文件中检索zip文件。我使用Python Mechanize和漂亮的汤做得很好，但是当我接近这个过程结束时遇到了一个问题。

在表格中选择了我想要的选项后（通过mechanize / bs4），然后我尝试让我的浏览器进入＆＃34;提交＆＃34;表单并检索我的zip文件。然而，＆＃34;提交＆＃34;按钮只是带有

的gif图像

onclick="javascript:submit()"

呼叫。当您在浏览器中手动点击该按钮时，它会将您重定向到通用＆＃34; ..... testdwn.cfm？RequestTimeout = 2000＆＃34;页面，无论您在单击gif图像之前选择哪个选项（也下载您的zip文件）。所以我的问题是没有专用的zip url。

因此，根据我过去几天在线阅读的内容，Python / Mechanize无法以任何身份阅读javascript，因此我似乎在这条途径上使用了SOL。如果机械化可以以某种方式单击该按钮，一切都会很好。

我应该采用什么方法来提取这些数据？我读过有关硒的信息，但我想知道哪种选择绝对最简单，最好是基于javascipt或基于python-selenium，还是其他什么？如果可以管理Python，则首选Python。

提前致谢！

Answer 1

好的我找到了使用Selenium的答案，

import selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains


driver = webdriver.Chrome(executable_path=r"C:\Users\xx\xx\xx\xx\xx\xx\chromedriver.exe")
driver.get("http://wogcc.state.wy.us/urecordsMenu.cfm?Skip=%27Y%27&oops=ID14447")
assert "Download Menu" in driver.title
form = driver.find_element_by_xpath("/html/body/table[2]/tbody/tr[7]/td/form/table[1]/tbody/tr[3]/td[2]/select/option[37]")
submit = driver.find_element_by_xpath("/html/body/table[2]/tbody/tr[7]/td/form/table[1]/tbody/tr[3]/td[1]/font/img")

ActionChains(driver).move_to_element(form).click(form).perform()
ActionChains(driver).move_to_element(submit).click(submit).perform()

我导航到页面并使用Selenium的find_element_by_path，以及他们的ActionChains来选择并点击我想要的所有内容

以自动脚本从网站检索zip文件

1 个答案: