我正在抓取http://www.mca.gov.in/DCAPortalWeb/dca/MyMCALogin.do?method=setDefaultProperty&mode=53
中的数据以下是我尝试的代码:
uri = "http://www.mca.gov.in/DCAPortalWeb/dca/MyMCALogin.do?method=setDefaultProperty&mode=53"
#html, html_content = @mobj.get_data(uri)
agent = Mechanize.new
html_page = agent.get uri
html_form = html_page.form
html_form.radiobuttons_with(:name => 'search',:value => '2')[0].check
html_form.submit
puts html_page.content
错误:
var/lib/gems/1.9.1/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:308:in `fetch': 500 => Net::HTTPInternalServerError for http://www.mca.gov.in/DCAPortalWeb/dca/ProsecutionDetailsSRAction.do -- unhandled response (Mechanize::ResponseCodeError)
from /var/lib/gems/1.9.1/gems/mechanize-2.7.3/lib/mechanize.rb:1281:in `post_form'
from /var/lib/gems/1.9.1/gems/mechanize-2.7.3/lib/mechanize.rb:548:in `submit'
from /var/lib/gems/1.9.1/gems/mechanize-2.7.3/lib/mechanize/form.rb:223:in `submit'
from ministry_corp_aff.rb:32:in `start'
from ministry_corp_aff.rb:52:in `<main>'
如果我手动点击第3个单选按钮然后提交它,我会得到一个.zip文件。我试图从该zip文件中获取.xls文件中的数据..
答案 0 :(得分:0)
单选按钮有一个onclick even处理程序,可触发某些javascript的执行。此外,单击提交<a>
标记也会导致执行某些JavaScript。该javascript可能会设置一些与表单一起返回的值,服务器会检查这些值。
Mechanize无法执行javascript。你需要selenium webdriver。