根据自动搜刮数据基于href

时间:2019-06-04 11:08:28

标签: python selenium-webdriver beautifulsoup

我遇到了使用Selenium Webdriver和python自动执行多个页面的问题。在我的代码中,我得到的页面被自动单击最多10个页面,但是在10个页面之后,它将不起作用。我没有在页面编号11之后点击页面。

import urllib.request
from bs4 import BeautifulSoup
import csv
import os
from selenium import webdriver
from selenium.webdriver.support.select import Select
from selenium.webdriver.common.keys import Keys
import time
import pandas as pd
import os



url = 'http://www.igrmaharashtra.gov.in/eASR/eASRCommon.aspx? 
hDistName=Buldhana'
chrome_path = r'C:/Users/User/AppData/Local/Programs/Python/Python36/Scripts/chromedriver.exe'
d = webdriver.Chrome(executable_path=chrome_path)
d.implicitly_wait(10)
d.get(url)



Select(d.find_element_by_name('ctl00$ContentPlaceHolder5$ddlTaluka')).select_by_value('7')
Select(d.find_element_by_name('ctl00$ContentPlaceHolder5$ddlVillage')).select_by_value('1464') 
 page = [page.get_attribute('href')for page in 
 d.find_elements_by_css_selector( 
 "#ctl00_ContentPlaceHolder5_grdUrbanSubZoneWiseRate [href*='Page$']")]

while True:
         pages = [page.get_attribute('href')for page in 
         d.find_elements_by_css_selector( 
         "#ctl00_ContentPlaceHolder5_grdUrbanSubZoneWiseRate 
         [href*='Page$']")]



         for script_page in pages:
            d.execute_script(script_page)
            #print(script_page)

1 个答案:

答案 0 :(得分:0)

尝试使用页面索引并检查页面是否可用,您必须单击每个页面然后继续。尝试以下代码。

from selenium import webdriver

url = 'http://www.igrmaharashtra.gov.in/eASR/eASRCommon.aspx?hDistName=Buldhana'
chrome_path = r'C:/Users/User/AppData/Local/Programs/Python/Python36/Scripts/chromedriver.exe'
d = webdriver.Chrome(executable_path=chrome_path)
d.implicitly_wait(10)
d.get(url)
Select(d.find_element_by_name('ctl00$ContentPlaceHolder5$ddlTaluka')).select_by_value('7')
Select(d.find_element_by_name('ctl00$ContentPlaceHolder5$ddlVillage')).select_by_value('1464')
i=2
while True:
    if len(d.find_elements_by_css_selector("#ctl00_ContentPlaceHolder5_grdUrbanSubZoneWiseRate a[href*='Page${}']".format(i)))>0:
        print( d.find_elements_by_css_selector("#ctl00_ContentPlaceHolder5_grdUrbanSubZoneWiseRate a[href*='Page${}']".format(i))[0].get_attribute('href'))
        d.find_elements_by_css_selector("#ctl00_ContentPlaceHolder5_grdUrbanSubZoneWiseRate a[href*='Page${}']".format(i))[0].click()
        i+=1
    else:
        break

输出: 由于我从第2页开始。


javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$2')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$3')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$4')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$5')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$6')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$7')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$8')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$9')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$10')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$11')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$12')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$13')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$14')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$15')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$16')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$17')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$18')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$19')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$20')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$21')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$22')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$23')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$24')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$25')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$26')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$27')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$28')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$29')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$30')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$31')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$32')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$33')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$34')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$35')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$36')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$37')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$38')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$39')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$40')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$41')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$42')

Process finished with exit code 0