得到了我的小机械化代码:
br.open('http://tumblr.com/customize');
print br.response().read()
print br.form['edit_tumblelog[cname]'] # there definitely is edit_tumblelog
# and br.form['edit_tumblelog[enable_cname]'] works fine
输出:
...
<br/>
<input type="text" class="text_field" style="width:275px; min-width:0px;
margin:6px 0px; border:solid 1px #d2d2d2;
"
name="cname" id="cname"
onchange="form_changed = true;"
value="blog.yay.com"
/>
...
Traceback (most recent call last):
File "/tmp/temp_textmate.W6p5gh", line 51, in <module>
print br.form['edit_tumblelog[cname]']
File "/Library/Python/2.6/site-packages/ClientForm-0.2.10-py2.6.egg/ClientForm.py", line 2891, in __getitem__
File "/Library/Python/2.6/site-packages/ClientForm-0.2.10-py2.6.egg/ClientForm.py", line 3222, in find_control
File "/Library/Python/2.6/site-packages/ClientForm-0.2.10-py2.6.egg/ClientForm.py", line 3306, in _find_control
ClientForm.ControlNotFoundError: no control matching name 'edit_tumblelog[cname]'
我做错了什么?
答案 0 :(得分:8)
发现问题。这是一个机械化HTML解析器中的一个错误,它会在<br/>
出现<br />
之后以某种方式忽略下一个标记。response = br.response()
response.set_data(response.get_data().replace("<br/>", "<br />")) #Python mechanize is broken, fixing it.
br.set_response(response)
正常工作。我的解决方案是手动替换那些:
re.sub()
显然,更好的解决方案是/>
{{1}}之前没有空格的所有标签。
答案 1 :(得分:6)
也许对任何人都有兴趣:
br=mechanize.Browser(factory=mechanize.RobustFactory())
这应解决HTML Parser的问题。