我是python的新手。我试图建立一个可以在使用aspx搜索表单的网站上执行搜索的机器人,我试图搜索表单然后将结果保存到文件。
这是我的剧本:
import urllib
from bs4 import BeautifulSoup
import urllib.request
from urllib.request import urlopen
headers = {
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17',
'Content-Type': 'application/x-www-form-urlencoded',
'Accept-Encoding': 'gzip,deflate,sdch',
'Accept-Language': 'en-US,en;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3'
}
class MyOpener(urllib.request.FancyURLopener):
version = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17'
myopener = MyOpener()
url = 'http://legistar.council.nyc.gov/Legislation.aspx'
# first HTTP request without form data
f = myopener.open(url)
soup = BeautifulSoup(f)
lastfocus = soup.select("#__LASTFOCUS")[0]['value']
eventtarget = soup.select("#__EVENTTARGET")[0]['value']
eventargument = soup.select("#__EVENTARGUMENT")[0]['value']
viewstate = soup.select("#__VIEWSTATE")[0]['value']
formFields = (
(r'__LASTFOCUS', lastfocus),
(r'__EVENTTARGET', eventtarget),
(r'__EVENTARGUMENT', eventargument),
(r'__VIEWSTATE', viewstate),
(r'ctl00_RadScriptManager1_TSM', ''),
(r'ctl00_tabTop_ClientState', ''),
(r'ctl00_ContentPlaceHolder1_menuMain_ClientState', ''),
(r'ctl00_ContentPlaceHolder1_gridMain_ClientState', ''),
# Check boxes
(r'ctl00$ContentPlaceHolder1$chkID', 'on'), # file number
(r'ctl00$ContentPlaceHolder1$chkText', 'on'), # Legislative text
(r'ctl00$ContentPlaceHolder1$chkAttachments', 'on'), # attachement
# etc. (not all listed)
(r'ctl00$ContentPlaceHolder1$txtSearch', 'york'), # Search text
(r'ctl00$ContentPlaceHolder1$lstYears', '2014'), # Years to include
(r'ctl00$ContentPlaceHolder1$lstTypeBasic', 'All Types'), #types to include
(r'ctl00$ContentPlaceHolder1$btnSearch', 'Search Legislation') # Search button itself
)
encodedFields = urllib.parse.urlencode(formFields)
# second HTTP request with form data
f = myopener.open(url, encodedFields)
try:
# actually we'd better use BeautifulSoup once again to
# retrieve results(instead of writing out the whole HTML file)
# Besides, since the result is split into multipages,
# we need send more HTTP requests
fout = open('tmp.html', 'wb')
except:
print('Could not open output file\n')
fout.writelines(f.readlines())
fout.close()
执行没有任何错误。但是当我打开tmp.html文件时,我看不到实际网站上显示的结果。
结果如下:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org /TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title>
Error
</title></head>
<body>
<form name="form1" method="post" action="Error.aspx" id="form1">
<div>
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="ND1u0lOZH65sNTWWoa6wLYsEtU6yeI938ytDgbd2dC167Gk8a/1RonXoednpTu74caJ8DocoE4ewDkNe6u02VlFhiTlr5MevcRRE7CVvClRleCWGYiPME3cqJWvjA8uv" />
</div>
<div>
<input type="hidden" name="__VIEWSTATEGENERATOR" id="__VIEWSTATEGENERATOR" value="AB827D4F" />
</div>
<div>
<h2>
Server Error</h2>
<h4>
The server encountered a temporary error and could not complete your request.</h4>
<h4>
Please <a href="Default.aspx">try again</a> in 30 seconds.</h4>
</div>
</form>
</body>
</html>
如何让脚本返回我要查找的结果?
非常感谢任何帮助。
答案 0 :(得分:1)
此代码完美无缺。
from selenium import webdriver
driver = webdriver.Firefox()
driver.get("http://legistar.council.nyc.gov/Legislation.aspx")
# Alternatively, link directly to the form:
# driver.get("https://www.icsi.in/student/Members/MemberSearch.aspx?SkinSrc=%5BG%5DSkins/IcsiTheme/IcsiIn-Bare&ContainerSrc=%5BG%5DContainers/IcsiTheme/NoContainer")
# Locate the elements.
first = driver.find_element_by_id("ctl00_ContentPlaceHolder1_txtSearch")
search = driver.find_element_by_id("ctl00_ContentPlaceHolder1_btnSearch")
# Input the data and click submit.
first.send_keys("York")
search.click()