下面是我的Python抓取代码的开头,该代码在过去一年中成功提取了所有数据。我的FireFox浏览器(版本65.0.2 64位)最近进行了更新,现在代码将不会直接转到目标网址。相反,它将上载FireFox并保留在空白浏览器页面上,直到代码超时。我最近上传了我的Selenium软件包(版本3.141.0),并尝试更改代码中的set_preferences,但是我无法解决此问题。有人知道如何解决这个问题吗?预先感谢!
import sys
import pandas as pd
import os
import time
from datetime import datetime
from selenium import webdriver
from selenium.webdriver.firefox.firefox_profile import FirefoxProfile
from sqlalchemy import create_engine
button_text_to_url_type = {
'dashboard': 8,
'standard': 0,
'advanced': 1,
'batted_ball': 2,
'win_probability': 3,
'pitch_type': 4,
'pitch_values': 7,
'plate_discipline': 5,
'value': 6,
'h_movement': 18,
'v_movement': 19
}
download_dir = os.getcwd()
profile = FirefoxProfile("C:/Users/nhwal_000/AppData/Roaming/Mozilla/Firefox/Profiles/zd6yzhfi.FG_Scrape")
profile.set_preference("browser.helperApps.neverAsk.saveToDisk", 'text/csv')
profile.set_preference("browser.download.manager.showWhenStarting", False)
profile.set_preference("browser.download.dir", download_dir)
profile.set_preference("browser.download.folderList", 2)
driver = webdriver.Firefox(firefox_profile=profile)
today = datetime.today()
for button_text, url_type in button_text_to_url_type.items():
default_filepath = os.path.join(download_dir, 'Fangraphs Leaderboard.csv')
desired_filepath = os.path.join(download_dir,
'{}_{}_{}_LeaderboardPIT_{}.csv'.format(datetime.today().year, today.month, today.day,
button_text))
driver.get(
"https://www.fangraphs.com/leaders.aspx?pos=all&stats=pit&lg=all&qual=0&type={}&season=2018&month=0&season1=2018&ind=0&team=&rost=&age=&filter=&players=".format(
url_type))
答案 0 :(得分:1)
我没有看到关于geckodriver的引用,因此您可能只需要安装最新版本的GeckoDriver
安装后,您可以将其路径添加到webdriver,例如:
driver = webdriver.Firefox(executable_path=r'your\path\to\geckodriver.exe', firefox_profile=profile)
它应该运行。