我需要使用python从具有多个下拉菜单的页面中抓取数据

时间:2019-05-25 16:39:56

标签: python web-scraping beautifulsoup python-requests

我需要从this网站上抓取数据。我从过去没有下拉菜单的页面抓取了网页。

在下拉菜单中,我从未进行过抓取过程。在上述页面中,我已经在“选择社群”下拉菜单(即下拉菜单中的单个选项)中抓取并导出了“ Ichapuram”案例的数据。。 >

这是相同的代码。

import requests
url='http://results.eci.gov.in/ac/en/constituencywise/ConstituencywiseS011.htm'
r = requests.get(url)                   #Sending requests to the page
from bs4 import BeautifulSoup           #For parsing html
content_parser = BeautifulSoup(r.content, 'html.parser')

rows=content_parser.find_all("tr",{'style':'font-size:12px;'})     #to find the rows of the table (7 in this case)
data=[]                                 #to store the data scraped

for i in range(0,len(rows)):                
    row=rows[i]
    col=row.find_all("td")              #Each row contains 7 columns enclosed with <td> </td> tag
    candidate=col[1].text.strip()      #The (.text) copies the text from the element in given row and column, (.strip) is used to remove any gaps present.
    party=col[2].text.strip()
    votes=int(col[5].text.strip())
    percentage=float(col[6].text.strip())
    data.append((candidate,party, votes, percentage))
                                        #The above code appends all the stored data (of 4 columns) with their respective column names mentioned 
import pandas as pd
data_frame=pd.DataFrame(data,columns=['candidate','party', 'votes', 'percentage'])
                                        #To store the whole data as a Data Frame (which can be converted to XLS/CSV files easily)
data_frame=data_frame.set_index('party')  #setting "party" column as index

from pandas import ExcelWriter
datatoexcel=ExcelWriter('C:/Users/SHRI/AppData/Local/Programs/Python/Python37/apelections.xlsx')
data_frame.to_excel(datatoexcel,'Sheet1')
datatoexcel.save()

这是上面针对单个选区的代码的结果:

The result

但是,下拉菜单中还有175种其他此类情况(选区)。那么,如何迭代此过程?此外,每个下拉菜单的表格尺寸都将不同。那么,如何使用python来解决此问题并从“选择社群”中的其余175个下拉菜单中抓取数据? 谢谢!

0 个答案:

没有答案