这是我在堆栈溢出中的第一篇文章。对于我的问题,我读了这篇文章“请求使用python到asp.net页面”,这也是Data Scraping, aspx,我找到了我想要的东西,但需要一些小帮助
我的查询是如何循环所有下拉菜单并单击搜索以获取页面内容,例如我能够 SCRAPE Agra并且可以获取页面详细信息
由于这是我的学习阶段,所以我现在避免使用selenium 来获取网页详情。
prop1, prop2, prop3
答案 0 :(得分:1)
顺便说一句,我已将这两个巨大的字符串放在两个变量中,这样你就不必担心它并使它变得更加苗条。 这是经过纠正的代码:
import requests
from bs4 import BeautifulSoup
url = "http://upreraportal.cloudapp.net/View_projects.aspx"
response = requests.get(url).text
soup = BeautifulSoup(response,"lxml")
VIEWSTATE = soup.select("#__VIEWSTATE")[0]['value']
EVENTVALIDATION = soup.select("#__EVENTVALIDATION")[0]['value']
for title in soup.select("#ContentPlaceHolder1_DdlprojectDistrict [value]")[:-1]:
search_item = title.text
# print(search_item)
headers= {'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
formfields = {'__VIEWSTATE':VIEWSTATE, #Put the value in this variable
'__EVENTVALIDATION':EVENTVALIDATION, #Put the value in this variable
'ctl00$ContentPlaceHolder1$DdlprojectDistrict':search_item, #this is where your city name changes in each iteration
#here in form details check agra , i am able to scrape one city only,
# how to loop for all cities
res = requests.post(url, data=formfields, headers=headers).text
soup = BeautifulSoup(res, "html.parser")
get_list = soup.find_all('option') #gets list of all <option> tag
for element in get_list :
cities = element["value"]
get_details = soup.find_all("table", attrs={"id":"ContentPlaceHolder1_GridView1"})
for details in get_details:
text = details.find_all("tr")[1:]
for tds in text:
td = tds.find_all("td")[1]
rera = td.find_all("span")
rnumber = ""
for num in rera:
rnumber = num.text