我需要从this网站上抓取数据。我从过去没有下拉菜单的页面抓取了网页。
在下拉菜单中,我从未进行过抓取过程。在上述页面中,我已经在“选择社群”下拉菜单(即下拉菜单中的单个选项)中抓取并导出了“ Ichapuram”案例的数据。。 >
这是相同的代码。
import requests
url='http://results.eci.gov.in/ac/en/constituencywise/ConstituencywiseS011.htm'
r = requests.get(url) #Sending requests to the page
from bs4 import BeautifulSoup #For parsing html
content_parser = BeautifulSoup(r.content, 'html.parser')
rows=content_parser.find_all("tr",{'style':'font-size:12px;'}) #to find the rows of the table (7 in this case)
data=[] #to store the data scraped
for i in range(0,len(rows)):
row=rows[i]
col=row.find_all("td") #Each row contains 7 columns enclosed with <td> </td> tag
candidate=col[1].text.strip() #The (.text) copies the text from the element in given row and column, (.strip) is used to remove any gaps present.
party=col[2].text.strip()
votes=int(col[5].text.strip())
percentage=float(col[6].text.strip())
data.append((candidate,party, votes, percentage))
#The above code appends all the stored data (of 4 columns) with their respective column names mentioned
import pandas as pd
data_frame=pd.DataFrame(data,columns=['candidate','party', 'votes', 'percentage'])
#To store the whole data as a Data Frame (which can be converted to XLS/CSV files easily)
data_frame=data_frame.set_index('party') #setting "party" column as index
from pandas import ExcelWriter
datatoexcel=ExcelWriter('C:/Users/SHRI/AppData/Local/Programs/Python/Python37/apelections.xlsx')
data_frame.to_excel(datatoexcel,'Sheet1')
datatoexcel.save()
这是上面针对单个选区的代码的结果:
但是,下拉菜单中还有175种其他此类情况(选区)。那么,如何迭代此过程?此外,每个下拉菜单的表格尺寸都将不同。那么,如何使用python来解决此问题并从“选择社群”中的其余175个下拉菜单中抓取数据? 谢谢!