从ruby中的站点检索发布数据

时间:2014-04-03 05:57:52

标签: ruby nokogiri mechanize

我尝试从网站上检索POST数据并尝试多次/与nokogiri,uri,mechanize组合,但我只检索来自get请求的数据。我没有看到感兴趣的内容的内容。

以下是从本网站获取的正文。我正在寻找内容div id =" list2"。有用户及其电话号码的表格。

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="Description" content="Wyszukiwarka"  />
<meta name="Author" content="LR"  />
<title>Tel</title>
<link href="styleblue.css" rel="stylesheet" type="text/css" />
<script type="text/javascript" src="includes/scripts.js"></script>
<script type="text/javascript" src="includes/jquery-1.6.1.min.js"></script>
<script type="text/javascript" src="includes/jquery.form.js"></script>
<link rel="stylesheet" type="text/css" href="img/themes/blue/style.css" />
<link rel="stylesheet" type="text/css" href="img/themes/ui/smoothness/jquery-ui-1.8.13.custom.css" media="screen"/>
<script type="text/javascript" src="includes/jquery-ui-1.8.13.custom.min.js"></script>
<script type="text/javascript" src="includes/ui.datepicker-pl.js"></script>

<script type="text/javascript">
$(document).ready(function(){
gridReloadTel();
})
</script></head>
<body><table style="width: 100%; margin: 0px; padding: 0px; vertical-align:top" cellpadding="0" cellspacing="0">
  <tr class="hideen">
    <td style="width: 100%"><table cellpadding="0" cellspacing="0" style="width:100%; margin:0px; padding:0px;">
        <tr>
          <td id="top_left_login" style="height: 101px"></td>
          <td style="height: 101px"><img alt="" src="img/top.jpg" /></td>
          <td id="top_right_login" style="height: 101px"><div style="position:relative; width:194px; left:-207px; bottom:36px; text-align:right ">Czwartek&nbsp;&nbsp;&nbsp;<span style="color:#FFFFFF;">03-04-2014</span></div></td>
        </tr>
      </table></td>
  </tr>
  <tr  class="hideen">
    <td id="menu"><div >
        <img src="img/blue/mline.jpg" border="0" alt="" /><a href="index.php">Wyszukiwarka</a><img src="img/blue/mline.jpg" border="0" alt="" /><a href="aktualizacja.php">Aktualizacja danych</a><img src="img/blue/mline.jpg" border="0" alt="" /><a href="pomoc.php">Pomoc</a><img src="img/blue/mline.jpg" border="0" alt="" />     



          </div>//Content
        </div>
      <br /><br />
        <div id="list2">I LOOKING FOR THIS DIV</div>

        <br />
      </div>
      <blockquote style="font-size:10px ">
        * aktualizacje <br/>
        <img src="img/plus.gif" width="18" height="18" /> 

      </blockquote></td>
  </tr>
  <tr class="hideen">
    <td style="width: 100%"><div id="bottom" align="center"><img src="img/bzit.jpg" width="225" height="42" border="0" alt="" /></div></td>
  </tr>
</table>
</body>
</html>

当我在firebug中检查网站时,我会看到GET url / index.php和POST url / grid / search.php。该网站位于本地网站。当我去标签XHR在哪里是POST search.php 我明白了

Connection Keep-Alive Content-Type text/html Date Thu, 03 Apr 2014 05:31:44 GMT Keep-Alive timeout=15, max=100 Server Apache Transfer-Encoding chunked X-Powered-By PHP/5.2.5Accept */* Accept-Encoding gzip, deflate Accept-Language pl,en-US;q=0.7,en;q=0.3 Cache-Control no-cache Connection keep-alive Content-Length 99 Content-Type application/x-www-form-urlencoded; charset=UTF-8 Host url Pragma no-cache Referer url/index.php User-Agent Mozilla/5.0 (Windows NT 5.1; rv:28.0) Gecko/20100101 Firefox/28.0 X-Requested-With XMLHttpRequest

接下来有标签回复,我感兴趣的是回复

    `<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml">
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <meta name="Description" content="Wyszukiwarka telefonów"  />
    <meta name="Author" content="LR"  />
    <title>tel</title>
    <link rel="stylesheet" type="text/css" href="/img/themes/blue/style.css" />

    </head>
    <body>

 <div id="contenttable">
    <table class="scroll" cellpadding="0" cellspacing="0" width="100%" >
      <thead >
        <tr>
          <td colspan="11">Lista wyników *</td>
        </tr>
      </thead>


      <tbody >
        ROWS WITH TELEPHONES
    </tbody>

    </table>
    <table class="scroll" cellpadding="0" cellspacing="0" width="100%" >
      <tbody >
      </tbody>
      <tfoot align="center">
        <tr>
          <td colspan="11" style="text-align:left"><img src="img/themes/blue/images/first.png"  onclick="jQuery('#page').val(1);gridReloadTel()" /> <img src="img/themes/blue/images/prev.png" onclick="jQuery('#page').val(1);gridReloadTel()" />
            <input id="page" type="text" value="2" size="3" maxlength="5"  onkeydown="doSearchTel(arguments[0]||event)" />
            / 802 <img src="img/themes/blue/images/next.png" onclick="jQuery('#page').val(3);gridReloadTel()" /> <img src="img/themes/blue/images/last.png" onclick="jQuery('#page').val(802);gridReloadTel()" /> | wyświetl
            <select id="rows" name="rows" onchange="gridReloadTel()">
              <option value="15" selected >15</option>
              <option value="25"  >25</option>
              <option value="50"  >50</option>
              <option value="200"  >200</option> 
              </select>
            | 12016 wierszy</td>
          </tr>
      </tfoot>
    </table>

    </div>
    <div style="position:absolute; top:140px; right:20px;"  class="hideen"><form action="export.php" method="post" target="_blank" id="exportform" name="exportform" >
        <a href="javascript:document.exportform.submit();" onmouseout="MM_swapImgRestore()" onmouseover="MM_swapImage('xlsex','','img/xls_down.jpg',1)"><img src="img/xls_up.jpg" name="xlsex"  border="0" id="xlsex" title="Wygeneruj spis wyb" /></a>
        <input name="sord" type="hidden" value="PRNazwa asc" /><input name="where" type="hidden" value=" 1=1 " />
        <input type="hidden" name="start" value="15" />
        <input type="hidden" name="limit" value="15" />
    </form></div>

    <script type="text/javascript">

      var _gaq = _gaq || [];
      _gaq.push(['_setAccount', '']);
      _gaq.push(['_trackPageview']);

      (function() {
        var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
        ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
        var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
      })();

    </script>
    </body></html>`

如何从div id =&#39; contenttable&#39;中检索此数据? ? 任何答案,想法对我都非常有帮助。

1 个答案:

答案 0 :(得分:1)

尝试机械化

@agent = Mechanize.new do |a|
      a.user_agent_alias = 'Windows Chrome'
      a.log = Logger.new "activity.log"
      a.get 'url/index.php'
    end

现在,您可以使用

提交帖子请求
@agent.post('url/grids/search.php', "foo" => "bar", headers go here)

获取查询参数&amp;标题,请参阅开发人员工具中的请求标题