使用Ruby与Nokogiri / Mechanize从webforms params asp中抓取数据

时间:2017-03-13 21:15:23

标签: ruby web-scraping mechanize-ruby

我目前正在尝试使用 Nokogiri Mechanize 在网页上使用ruby来抓取数据。我想从下一个链接获取数据以获取投标列表: http://www.panamacompra.gob.pa/ambientepublico/AP_BusquedaAvanzada.aspx

- 按照这个程序---

  1. 打开网址:
  2. 已经开放,请到现场:Número
  3. Número字段的值为: 2017-1-37-0-15-cm-011063
  4. 按下第一个绿色按钮: BUSCAR
  5. 向下看看带有招标过滤的表格
  6. 这是我的代码:

    require 'rubygems'
    require 'mechanize'
    
    
    a = Mechanize.new do |agent|
      agent.user_agent_alias = 'Mac Safari'
      agent.follow_meta_refresh = true
    end
    
    
    @url='http://www.panamacompra.gob.pa/ambientepublico/AP_BusquedaAvanzada.aspx'
    @m=Mechanize.new
    @payload=''
    @body_page = ''
    @search_string='2017-1-37-0-15-cm-011063'
    @viewstate=""
    
    def set_payload
        {
            'txtGSA' => '',
            'ctl00$ContentPlaceHolder1$txtNumeroAdquisicion'=> '',
            'ctl00$ContentPlaceHolder1$txtNombreAdquisicion' => '',
            'ctl00$ContentPlaceHolder1$txtNombreDemandante' => '',
            'ctl00$ContentPlaceHolder1$txtNombreDependencia' => '',
            'ctl00$ContentPlaceHolder1$txtNombreProveedor' => '',
            'ctl00$ContentPlaceHolder1$txtFechaDesde' => '13-02-2017',
            'ctl00$ContentPlaceHolder1$txtFechaHasta' => '13-03-2017',
            'ctl00$ContentPlaceHolder1$txtNombreRubro' => '',
            'ctl00_ContentPlaceHolder1_ASPxPopupControl1WS' => '0:0:-1:0:0:0:0:0:;0:0:-1:0:0:0:0:0:',
            'ctl00$ContentPlaceHolder1$ControlPaginacion$hidTotalPaginas' => '0',
            'ctl00$ContentPlaceHolder1$ControlPaginacion$hidNumeroPagina' => '1',
            'ctl00$ContentPlaceHolder1$ControlPaginacion$hidOrigen' => '0',
            'ctl00$ContentPlaceHolder1$ControlPaginacion$hidTotalFilas' => '1',
            'ctl00$ContentPlaceHolder1$ControlPaginacion$hidInicioAnterior' => '1',
            'ctl00$ContentPlaceHolder1$ControlPaginacion$hidFinAnterior' => '1',
            'ctl00$ContentPlaceHolder1$ControlPaginacion$hidBloqueInicio' => '1',
            'ctl00$ContentPlaceHolder1$ControlPaginacion$hidMaxFilasPorPagina' => '20',
            'ctl00$ContentPlaceHolder1$ControlPaginacion$hidMaxPaginasPorListado' => '9',
            'ctl00$ContentPlaceHolder1$ControlPaginacion$hidCambioBloque' => 'False',
            'ctl00$ContentPlaceHolder1$ControlPaginacion$hidMostrarEstado' => 'False',
            'ctl00$ContentPlaceHolder1$ControlPaginacion$hidMostrarMensaje' => 'True',
            'ctl00$ContentPlaceHolder1$ControlPaginacion$hidValoresPorDefecto' => 'True',
            'ctl00$ContentPlaceHolder1$hidIdDependencia' => '-1',
            'ctl00$ContentPlaceHolder1$hidNombreDependencia' => '-1',
            'ctl00$ContentPlaceHolder1$hidIdOrgV' => '-1',
            'ctl00$ContentPlaceHolder1$hidIdEmpresaVenta' => '-1',
            'ctl00$ContentPlaceHolder1$hidIdEmpresaC' => '0',
            'ctl00$ContentPlaceHolder1$hidIdOrgC' => '-1',
            'ctl00$ContentPlaceHolder1$hidNombreDemandante' => '-1',
            'ctl00$ContentPlaceHolder1$hidDependencia' => '-1',
            'ctl00$ContentPlaceHolder1$hidIDRubro' => '-1',
            'ctl00$ContentPlaceHolder1$hidRedir' => '',
            'ctl00$ContentPlaceHolder1$hidRangoMaximoFecha' => '',
            'ctl00$ContentPlaceHolder1$hidIDProducto' => '-1',
            'ctl00$ContentPlaceHolder1$hidIDProductoNoIngresado' => '-1',
            'ctl00$ContentPlaceHolder1$hidNombreProducto' => '-1',
            'ctl00$ContentPlaceHolder1$hidNombreProveedor' => '-1',
            'ctl00$ContentPlaceHolder1$lstUnidadCompra' => '',
            'ctl00$ContentPlaceHolder1$lstEstado' => '0'
        }
    end
    ```
    
    ```
    @m.get @url do |page|
      page.form_with :name => "aspnetForm" do |search_form|
        @viewstate = search_form.field_with(:name => "__VIEWSTATE").value
        @payload=set_payload
        @m.post(@url,@payload).form_with :name => "aspnetForm" do |search_form_2|
            search_form_2.field_with(:name => "ctl00$ContentPlaceHolder1$txtNumeroAdquisicion").value = @search_string
            submit_button = search_form_2.button_with(:id=>"ctl00_ContentPlaceHolder1_btnBuscar")
            finish = search_form_2.submit(submit_button)
            @body_page = finish
        end
        puts Nokogiri::HTML(@body_page.body)
      end
    end

    为什么表单不执行帖子? 不带帖子信息

    结果:

    <td class="style1" align="left" valign="top">
      <input name="ctl00$ContentPlaceHolder1$txtNumeroAdquisicion" type="text" value="2017-1-37-0-15-cm-011063" id="ctl00_ContentPlaceHolder1_txtNumeroAdquisicion">
      <span class="formEjemplos2"><i>Ej.: 2008-1-027-00-08-LP-000274</i></span>
      <div id="divNumLC"></div>
    </td>

    数据发送显示在字段上,但是招标表编号

0 个答案:

没有答案