Question

我删除了我之前的帖子，因为它不清楚

这是我在堆栈溢出中的第一篇文章。对于我的问题，我读了这篇文章“请求使用python到asp.net页面”，这也是Data Scraping, aspx，我找到了我想要的东西，但需要一些小帮助

我的问题是我想网站抓一个网站http://up-rera.in/，它是aspx动态网站。通过单击inspect元素网站，会抛出另一个链接：http://upreraportal.cloudapp.net/View_projects.aspx

它正在使用Aspx

我的查询是如何循环所有下拉菜单并单击搜索以获取页面内容，例如我能够 SCRAPE Agra并且可以获取页面详细信息

由于这是我的学习阶段，所以我现在避免使用selenium 来获取网页详情。

是否有任何一个可以正确指导我并帮助我修改下面提到的代码：

prop1, prop2, prop3

Answer 1

尝试以下代码。它将为您提供您所追求的所有结果。只需要一点点抽搐。我只是从下拉菜单中删除了不同的名称，并在循环中使用它们，以便您可以逐个获取所有数据。除了添加几行之外我还注意到了其他内容。如果你把它包装在一个函数中，你的代码可能会更好。

顺便说一句，我已将这两个巨大的字符串放在两个变量中，这样你就不必担心它并使它变得更加苗条。这是经过纠正的代码：

import requests
from bs4 import BeautifulSoup

url = "http://upreraportal.cloudapp.net/View_projects.aspx"
response = requests.get(url).text
soup = BeautifulSoup(response,"lxml")

VIEWSTATE = soup.select("#__VIEWSTATE")[0]['value']
EVENTVALIDATION = soup.select("#__EVENTVALIDATION")[0]['value']

for title in soup.select("#ContentPlaceHolder1_DdlprojectDistrict [value]")[:-1]:
    search_item = title.text
    # print(search_item)

    headers= {'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
              'Content-Type':'application/x-www-form-urlencoded',
              'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}

    formfields = {'__VIEWSTATE':VIEWSTATE,  #Put the value in this variable
                '__VIEWSTATEGENERATOR':'4F1A7E70',
                '__EVENTVALIDATION':EVENTVALIDATION, #Put the value in this variable
                'ctl00$ContentPlaceHolder1$DdlprojectDistrict':search_item, #this is where your city name changes in each iteration
                'ctl00$ContentPlaceHolder1$txtProject':'',
                'ctl00$ContentPlaceHolder1$btnSearch':'Search'}

    #here in form details check agra , i am able to scrape one city only,
    # how to loop for all cities
    res = requests.post(url, data=formfields, headers=headers).text
    soup = BeautifulSoup(res, "html.parser")

    get_list  = soup.find_all('option')   #gets list of all <option> tag
    for element in get_list :
        cities = element["value"]
        #final.append(cities)
        #print(final)

    get_details = soup.find_all("table", attrs={"id":"ContentPlaceHolder1_GridView1"})

    for details in get_details:
        text = details.find_all("tr")[1:]
        for tds in text:
            td = tds.find_all("td")[1]
            rera = td.find_all("span")
            rnumber = ""
            for num in rera:
                rnumber = num.text
                print(rnumber)

如何使用python请求和BeautifulSoup以及刮取数据在Aspx动态网站的下拉菜单中循环

1 个答案: