如何使用Selenium和python遍历动态选择选项?

时间:2018-06-21 10:05:30

标签: python selenium web-scraping webautomation

下面的抓取问题出现了,我不得不从该网站(http://agmarknet.gov.in/PriceTrends/Sa_Week_pri.aspx)抓取2000年至2018年的选定商品数据

我的目标是逐年遍历所有的年,月和周下拉选项,并通过选择(导出到CSV按钮)提交按钮以获取Web表格数据。但是所有的下拉选项都是动态生成的。

我对Selenium和python还是陌生的,所以我尽了最大的努力来解决问题,但我做不到。

这是我的代码

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select
import time
from bs4 import BeautifulSoup
import pandas as pd

browser = webdriver.Chrome("path")
browser.get('http://agmarknet.gov.in/PriceTrends/Sa_Week_pri.aspx')
wait = WebDriverWait(browser, 19)

#iterating through drop down options
commodities=['17','49','260','28','29','266','112','45','367','366','10','312','372','4','20','23','2','24','3','11','13','48','150','285','44','141','78','369']
years=['2002','2003','2004','2005','2006','2007','2008','2009','2010','2011','2012','2013','2014','2015','2016','2017','2018']
months=['1','2','3','4','5','6','7','8','9','10','11','12']
weeks=['1','2','3','4']
for p in commodities:

    Select(browser.find_element_by_xpath('//select[@name="ctl00$cphBody$Commod1_List"]')).select_by_value(p)
    for q in years:
        element = wait.until(EC.element_to_be_clickable((By.ID, 'cphBody_Year3_List')))
        Select(browser.find_element_by_xpath('//*[@id="cphBody_Year3_List"]')).select_by_value(q)
        for r in months:
            element = wait.until(EC.element_to_be_clickable((By.ID, 'cphBody_Month3_List')))
            Select(browser.find_element_by_xpath('//*[@id="cphBody_Month3_List"]')).select_by_value(r)
            for s in weeks:
                element = wait.until(EC.element_to_be_clickable((By.ID, 'cphBody_Week2_List')))
                time.sleep(10)
                Select(browser.find_element_by_xpath('//*[@id="cphBody_Week2_List"]')).select_by_value(s)
                browser.find_element_by_xpath('//*[@id="cphBody_Button_Subm"]').click()
                time.sleep(10)
                browser.find_element_by_name('ctl00$cphBody$Button1').click()
                browser.back()
                break
            break
        break

0 个答案:

没有答案