mechanize无法使用已禁用且没有值的SubmitControl读取表单

时间:2012-02-12 15:33:10

标签: python mechanize

我尝试使用mechanize(v0.2.5)来处理页面上的表单,该表单将禁用的图像作为表单元素之一。当我尝试选择表单时,mechanize会引发AttributeError: control 'test' is disabled,其中test是禁用控件的名称。例如,

br = mechanize.Browser(factory=mechanize.RobustFactory())
br.open("http://whatever...")
br.select_form(nr=0)

导致此堆栈跟踪:

    br.select_form(nr=0)
  File "build\bdist.win32\egg\mechanize\_mechanize.py", line 499, in select_form
  File "build\bdist.win32\egg\mechanize\_html.py", line 544, in __getattr__
  File "build\bdist.win32\egg\mechanize\_html.py", line 557, in forms
  File "build\bdist.win32\egg\mechanize\_html.py", line 237, in forms
  File "build\bdist.win32\egg\mechanize\_form.py", line 844, in ParseResponseEx
  File "build\bdist.win32\egg\mechanize\_form.py", line 1017, in _ParseFileEx
  File "build\bdist.win32\egg\mechanize\_form.py", line 2735, in new_control
  File "build\bdist.win32\egg\mechanize\_form.py", line 2336, in __init__
  File "build\bdist.win32\egg\mechanize\_form.py", line 1221, in __setattr__
AttributeError: control 'test' is disabled

检查机械化源代码,当有任何表单元素评估为mechanize.SubmitControl并且没有预定义的value属性时,看起来总是会引发此错误。例如,以下表单会引发相同的错误:

<form action="http://whatever" method="POST">
    <input name="test" type="submit" disabled="disabled" />
</form>

我不确定这是否应该算作错误,但无论如何都有解决方法吗?例如,在调用br.select_form()之前,有没有办法可以更改目标页面的HTML以启用已禁用的控件?

修改

我已经提交了一个机械化补丁修复了这个问题。

2 个答案:

答案 0 :(得分:8)

可悲的是,这已经超过一年了,上游的机械化已经still not merged the pull request

与此同时,您可以使用我编写的这个猴子补丁来解决这个问题,而无需手动安装补丁版本。希望在(if)0.2.6发布时解决此错误,因此补丁仅适用于0.2.5及更早版本。

def monkeypatch_mechanize():
    """Work-around for a mechanize 0.2.5 bug. See: https://github.com/jjlee/mechanize/pull/58"""
    import mechanize
    if mechanize.__version__ < (0, 2, 6):
        from mechanize._form import SubmitControl, ScalarControl

        def __init__(self, type, name, attrs, index=None):
            ScalarControl.__init__(self, type, name, attrs, index)
            # IE5 defaults SUBMIT value to "Submit Query"; Firebird 0.6 leaves it
            # blank, Konqueror 3.1 defaults to "Submit".  HTML spec. doesn't seem
            # to define this.
            if self.value is None:
                if self.disabled:
                    self.disabled = False
                    self.value = ""
                    self.disabled = True
                else:
                    self.value = ""
            self.readonly = True

        SubmitControl.__init__ = __init__

答案 1 :(得分:0)

这肯定是一个错误,向上游报告它,制作补丁,向上游提交它并在此期间使用补丁版本是处理它的相当正确的方法。 (感谢您选择这种方式。)

正如您所提到的,另一种方法是通过预处理源HTML来处理它(如果您匆忙,或者由于某些原因不能/不想使用修补版本,这可能很有用,但请注意该变通办法对社区没有帮助)。对于后处理,可以使用任何合适的方法 - 从str.replace()到使用BeautifulSouplxml的DOM级处理。