所以如果我直接去地址,它会重定向,并加载我想要的页面,但如果我尝试使用下面的代码,它会给出错误代码500;
import urllib, urllib2
import sys
if len(sys.argv) > 1:
molecule = sys.argv[1]
else:
print "Error: no molecule requested"
sys.exit()
# Created handler
redirectionHandler = urllib2.HTTPRedirectHandler()
# 2 apply the handler to an opener
opener = urllib2.build_opener(redirectionHandler)
# 3. Install the openers
urllib2.install_opener(opener)
request = urllib2.Request("http://cccbdb.nist.gov/getform.asp", data=urllib.urlencode({'formula':molecule, "submit1": "Submit"}))
response = urllib2.urlopen(request)
只需执行类似python program.py ch4
之类的操作即可使用该程序,它会抛出错误。只是访问链接工作正常;
例如http://cccbdb.nist.gov/getform.asp?formula=ch4
我想要做的是填写this页面上的表单,然后加载结果页面。那页有这个:
<FORM action="getform.asp" method=POST id=form1 >
<INPUT type="text" id=text1 name=formula
VALUE='CH4'
></P>
<INPUT type="submit" value="Submit" id=submit1 name=submit1>
</FORM>
当我将帖子数据直接提交到该页面时,它给了我我想要的内容。当我使用urllib2时,它会给出错误500.我已经尝试过这样做,没有设置重定向的东西,并试图捕获错误并忽略它同时采取任何内容(根据another回答)。
$ wget http://cccbdb.nist.gov/getform.asp?formula=ch4
也不起作用。
如何在python中获取此页面的内容?
编辑:
我刚刚使用mechanize
尝试过,我得到同样的错误;
import mechanize
import cookielib
from BeautifulSoup import BeautifulSoup
import html2text
import sys
if len(sys.argv) > 1:
molecule = sys.argv[1]
else:
print "Error: no molecule requested"
sys.exit()
# Browser
br = mechanize.Browser()
# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
# Browser options
br.set_handle_equiv(True)
#br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
# Follows refresh 0 but not hangs on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
# User-Agent (this is cheating, ok?)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
# The site we will navigate into, handling it's session
br.open('http://cccbdb.nist.gov/geom1.asp')
# Select the first (index zero) form
br.select_form(nr=0)
br.form["formula"] = molecule
br.submit()