我是python的新手,如何抓取基于DoPostback ajax的网站 我已经完成了一些包含DoPostback ajax请求的站点,但是现在 我无法在代码中移动下一页
我尝试过正则表达式并请求
url = "https://inmatelocator.cdcr.ca.gov/Results.aspx"
headers = {}
headers["Accept"] = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3"
headers["Accept-Encoding"] = "gzip, deflate, br"
headers["Accept-Language"] = "en-GB,en-US;q=0.9,en;q=0.8"
headers["Cache-Control"] = "max-age=0"
headers["Connection"] = "keep-alive"
headers["Content-Length"] = "509"
headers["Content-Type"] = "application/x-www-form-urlencoded"
headers["Cookie"] = "__utmc=158387685; __utmz=158387685.1564555174.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); _ga=GA1.2.1411681485.1564555174; _gid=GA1.2.1770415657.1564555174; visited=yes; ASP.NET_SessionId=hveu1jjwinc2jbvgmxlhfjeb; __utma=158387685.1411681485.1564555174.1564555174.1564564262.2; __utmt=1; __utmb=158387685.6.10.1564564262"
headers["Host"] = "inmatelocator.cdcr.ca.gov"
headers["Origin"] = "https://inmatelocator.cdcr.ca.gov"
headers["Referer"] = "https://inmatelocator.cdcr.ca.gov/search.aspx"
headers["Upgrade-Insecure-Requests"] = "1"
headers["User-Agent"] = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36"
payload = {}
payload["__EVENTTARGET"] = ""
payload["__EVENTARGUMENT"] = ""
payload["__VIEWSTATE"] = "/wEPDwUKMTkzMjEyNjY0NWRk+VGMFvcoZ1pXgYxddY+z7NJROEM="
payload["__VIEWSTATEGENERATOR"] = "BBBC20B8"
payload["__EVENTVALIDATION"] = "/wEWBwK6sbiGDwKfnPGUDgKJ8PZdAqf4qBUC1Yr3wwwC+criiQECkMi7qA6SpjQfXqabwMzzGMlWuON873C8aw=="
payload["ctl00$LocatorPublicPageContent$txtCDCNumber"] = ""
payload["ctl00$LocatorPublicPageContent$txtLastName"] = "a"
payload["ctl00$LocatorPublicPageContent$txtFirstName"] = ""
payload["ctl00$LocatorPublicPageContent$txtMiddleName"] = ""
payload["ctl00$LocatorPublicPageContent$btnSearch"] = "Search for Inmate"
r = requests.post(url, headers=headers, data=payload)
all_links = re.findall('''<td><a href=".*?">.*?</a></td><td>(.*?)</td>''', r.text, re.S | re.I)
for link in all_links:
new_url = "https://inmatelocator.cdcr.ca.gov/" + link
print(new_url)
我无法获取下一页记录 请帮助获取记录