Question

我不确定为什么在将任何查询提交到表单时，在Google AppEngine上托管此简单代码会返回服务器错误。问题似乎是使用行html = urllib2.urlopen（“http://google.com/search?q=”+ q）.read（），因为代码在没有它的情况下工作正常。

import webapp2
import urllib2


form="""
<form action="/process">
    <input name="q">
    <input type="submit">
</form>
"""


class MainHandler(webapp2.RequestHandler):
    def get(self):
        self.response.out.write(form)


class ProcessHandler(webapp2.RequestHandler):
    def get(self):
        q = self.request.get("q")
        html = urllib2.urlopen("http://google.com/search?q=" + q).read()
        self.response.out.write(html)


app = webapp2.WSGIApplication([('/', MainHandler),
                               ('/process', ProcessHandler)],
                               debug=True)

这是返回的错误：

Error: Server Error
The server encountered an error and could not complete your request.

If the problem persists, please report your problem and mention this error message and the query that caused it.

Answer 1

www.google.com可能不接受此类直接连接，取消来自特定用户代理的连接。在简单的python环境中，您可以更改用户代理字符串，但我认为通过谷歌应用引擎无法实现这一点。

Answer 2

Google正在将403返回到您的搜索字符串

>>> import urllib2
>>> html = urllib2.urlopen("http://google.com/search?q=Test").read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 410, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 523, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 442, in error
    result = self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 629, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/usr/lib/python2.7/urllib2.py", line 410, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 523, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 448, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 531, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden

但这有效：

html = urllib2.urlopen（“http://google.com”）。read（）

所以看起来谷歌正试图阻止这种搜索。正如另一张海报建议的那样，更改用户代理字符串可能会停止403.选择一些常见的东西！

我刚刚使用Mozilla用户代理集测试过，我可以得到我认为您正在寻找的结果

import urllib2
headers = { 'User-Agent' : 'Mozilla/5.0' }
req = urllib2.Request('http://google.com/search?q=Test', None, headers)
html = urllib2.urlopen(req).read()
print html

在Google AppEngine上使用urllib2时出现服务器错误

2 个答案: