解决方案（感谢下面的karnesJ.R）：

Question

我正在为一些现有的python代码编写一个函数，该代码将作为参数传递给Mechanize浏览器对象。

我在浏览器中填写表单中的一些详细信息，然后使用response = browser.submit()将浏览器移动到新页面，并从中收集一些信息。

不幸的是，我偶尔会遇到以下错误：

httperror_seek_wrapper: HTTP Error 500: Internal Server Error

我已经在我自己的浏览器中导航到该页面，果然，我偶尔会直接看到此错误，所以我认为这是一个服务器问题，与robots.txt，标题或类似内容无关。

问题是提交后，browser对象的状态发生了变化，我无法继续使用它。我的第一个想法是首先尝试使用深层副本并在遇到问题时使用它，但这会导致错误TypeError: object.__new__(cStringIO.StringO) is not safe, use cStringIO.StringO.__new__()，如here所述。

我也尝试使用browser.back()，但收到NoneType错误。

有没有人有这个好的解决方案？

解决方案（感谢下面的karnesJ.R）：

下面的优秀解决方案使用优秀的requests库（docs here）。 requests具有填写表单并通过post或get提交的功能，这一点很重要，不会更改br对象的状态。

excellent website允许我们测试各种错误代码，并且在我测试过的顶部有一个表单界面。我在此网站上创建了一个br对象，然后定义了一个从br中选择表单的函数，提取相关信息，但通过requests进行提交 - 以便{br 1}}对象没有改变并且可以重复使用。错误代码导致requests返回垃圾，但不会使br无法使用。

如下所述，这需要更多的设置时间，但非常值得。

import mechanize
import requests

def testErrorCodes(br,theCodes):
    for x in theCodes:

        br.select_form(nr=0)

        theAction = br.action
        payload = {'code': x}

        response = requests.post(theAction, data=payload)
        print response.status_code

br=mechanize.Browser()
br.set_handle_robots(False)
response = br.open("http://savanttools.com/test-http-status-codes")

testErrorCodes(br,[401,402,403,404,500,503,504]) # Prints the error codes 

testErrorCodes(br,[404]) # The browser is still alive and well to be used again!

Answer 1

自从我为python编写以来已经有一段时间了，但我想我的解决方案已经解决了。试试这个方法：

import requests
except Mechanize.HTTPError:
    while true: ## DANGER ##
        ## You will need to format and/or decode the POST for your form
        response = requests.post('http://yourwebsite.com/formlink', data=None, json=None)
        ## If the server will accept JSON formatting, this becomes trivial
        if response.status_code == accepted_code: break

您可以找到有关requests库here的文档。我个人认为requests对你的情况比mechanize更好......但是你需要更多的开销，因为你需要使用某种RESTful分解提交到原始POST浏览器中的拦截器。

但最终，通过传入br，您将自己限制在机械化处理br.submit()上的浏览器状态的方式。

Answer 2

我假设您希望提交即使需要多次尝试也能进行。

我想到的解决方案肯定没有效率，但它应该可行。

def do_something_in_mechanize():
    <...insert your code here...>
    try:
        browser.submit()
        <...rest of your code...>
    except mechanize.HTTPError:
        do_something_in_mechanize()

基本上，它会在没有HTTPError s的情况下执行操作之前调用该函数。

从Mechanize

解决方案（感谢下面的karnesJ.R）：

2 个答案: