我有以下网站 (URL:https://www.legislation.qld.gov.au/subscribers/inforce/current) 从那里我需要自动下载文件。我必须下载约250个文件。我使用urllib下载文件,但是文件存在一些问题,使我们无法继续工作。
但是我确实通过右键单击并将文件保存到根据我们的要求工作的文件夹中来手动下载案例。
我正在尝试使用Selenium ActionChains库来模拟右键单击并单击“将链接另存为”的操作,但是它不起作用。我已经在堆栈中搜索了答案,但找不到任何东西,但是在一个问题中有人回答说Selenium无法执行。
我可以使用selenium单击链接,但是它只会在新标签中打开XML,而不是将文件下载到我的目录中。
能否请您告诉我如何使用Python解决此问题?
请查看我尝试的代码
使用Urllib
import urllib.request
import csv
URL = 'https://www.legislation.qld.gov.au/subscribers/inforce/current/'
count = 0
with open(r"D:\LEG_DOWNLOAD\urls.csv") as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
# Skip the heading line
readCSV.__next__()
for row in readCSV:
LEG = row[0]
file_url = URL + LEG
print("downloading -> ",file_url)
response = urllib.request.urlopen(file_url)
data = response.read()
# Write data to file
filename = LEG
file_ = open(r"D:\LEG_DOWNLOAD\QLDLEG\XML\NEW/"+filename, 'w')
file_.write(data.decode('utf-8'))
file_.close()
count += 1
print(filename+" saved.\n")
print(str(count) + " files downloaded")
使用硒
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import StaleElementReferenceException
from selenium.webdriver import ActionChains
options = webdriver.ChromeOptions()
prefs = {
"download.default_directory": r"D:\LEG_DOWNLOAD\QLDLEG\XML\NEW",
"download.prompt_for_download": False,
"download.directory_upgrade": True,
"safebrowsing.enabled": True}
options.add_experimental_option('prefs', prefs)
browser = webdriver.Chrome(executable_path=r'D:\CHROME\chromedriver.exe', chrome_options=options)
browser.get('https://www.legislation.qld.gov.au/subscribers/inforce/current')
elems=browser.find_elements_by_xpath("//*[contains(@href,'/subscribers/inforce/current/act-')]")
elemstext=[]
for link in elems:
elemstext.append(link.text)
for linktext in elemstext:
action = ActionChains(browser)
action.move_to_element(browser.find_element_by_link_text(linktext)).context_click().send_keys(Keys.ARROW_DOWN).send_keys(Keys.ARROW_DOWN).send_keys(Keys.RETURN).perform();
time.sleep(10)
browser.back()
time.sleep(10)
browser.quit()