为什么我不能使用我的Nokogiri访问器到达此表?

时间:2013-10-30 02:02:33

标签: html ruby nokogiri

以下是Chrome中“inspect element”的XPATH:

//*[@id="configparse_port_list"]

以下是我用来访问桌子的Nokogiri CSS选择器:

doc.css("#configparse_port_list")

但我得到的只是一个空数组。

我做错了什么?

这也行不通:

doc.css('table[@id="configparse_port_list"]')

HTML:

<!DOCTYPE html>
<html>
<head>
  <title>SIAM</title>
  <link href="/assets/application-49cce08127ac99204d4cb59e3bfaab8e.css" media="all" rel="stylesheet" type="text/css" />
  <script src="/assets/application-50259c7e8f6a002b7166ab714e68857b.js" type="text/javascript"></script>
  <script src="/assets/controllers/configparse_ports-925b92a6e41f7ffc3014e351d29291fc.js" type="text/javascript"></script>
  <meta content="authenticity_token" name="csrf-param" />
<meta content="FFh3mbfqnLZhWclBmQ/kEeYSJPeQvapaC0tK9f4wWH8=" name="csrf-token" />
</head>

<body class="configparse_ports_index ctrl_configparse_ports" data-controller="configparse_ports" data-action="index">
    <div id="header">
        <a href="https://siam-pro.qa.domain.com/"><img alt="domain_logo" src="/assets/domain_logo-0e44a80f1d9f1f9ce8fb7aa35dbc008b.png" /></a>
        <div>
            <div class="product_name">SIAM</div>
            <div class="version">v5.1</div>
        </div>

        <form accept-charset="UTF-8" action="/search/quick.json" class="ignoreDirty" data-remote="true" id="quick_search" method="post"><div style="margin:0;padding:0;display:inline    "><input name="utf8" type="hidden" value="&#x2713;" /><input name="authenticity_token" type="hidden" value="FFh3mbfqnLZhWclBmQ/kEeYSJPeQvapaC0tK9f4wWH8=" /></div>

            <input id="search_testcases" name="search[testcases]" type="hidden" value="true" />
            <input id="search_testplans" name="search[testplans]" type="hidden" value="true" />
            <input id="search_component_names" name="search[component_names]" type="hidden" value="true" />
            <input autocomplete="off" id="search_term" name="search[term]" placeholder="Search" type="text" />
</form>
        <ul class="menu">
            <li><a href="https://siam-pro.qa.domain.com/">Home</a></li>
                <li><a href="/settings">Settings</a></li>
        </ul>
    </div>

    <div id="wrapper">
        <div id="content">
            <div id="loading">Loading ...</div>
            <div id="flash">


            </div>
            <div id="warning_message"></div>
            <h1>Listing Configparse Ports</h1>

<div id="configparse_port_filters" class="filter_wrap">
    <h4>Filter &nbsp;</h4>
</div>

<table id="configparse_port_list">
    <thead>
        <tr>
            <th>ID #</th>
            <th>Name</th>
            <th>ANI Release</th>
            <th>Network Configuration</th>
            <th>State</th>
        </tr>
    </thead>

    <tbody>
            <tr>
#MANY TRS - one of which I'm looking for based on the 3rd td (ANI Release)
            </tr>
    </tbody>
</table>


        </div>
    </div>

    <div id="sidebar">
        <h3>Testcases</h3>
        <ul>
            <li><a href="/testcases/new">New</a></li>
            <li><a href="/search/testcase/new">Search</a></li>
            <li><a href="/search/bugzilla_cr/new">Import RTC</a></li>
        </ul>

        <h3>Testplans</h3>
        <ul>
            <li><a href="/testplans/new">New</a></li>
            <li><a href="/search/testplan/new">Search</a></li>
            <li><a href="/testplans">List Active</a></li>
        </ul>

        <h3>Use Cases</h3>
        <ul>
            <li><a href="/use_cases/new">New</a></li>
            <li><a href="/search/use_case/new">Search</a></li>
            <li><a href="/use_cases/manage">Manage</a></li>
        </ul>

        <h3>Configparse</h3>
        <ul>
            <li><a href="/configparse_ports/new">New</a></li>
            <li><a href="/configparse_ports">List Ports</a></li>
        </ul>

        <h3>Automation</h3>
        <ul>
            <li><a href="/automation_suites/new">New</a></li>
            <li><a href="/search/automation_suite/new">Search</a></li>
            <li><a href="/automation/status">Status</a></li>
        </ul>

    </div>

    <div id="footer">
        <div>
            <ul class="menu">
                <li><a href="mailto:siam-help@domain.com">Email SIAM Support</a></li>
                <li><a href="http://agora.domain.com/wiki/SIAM">SIAM WIKI</a></li>
            </ul>

            <div class="copyright">&copy; 2012 domain Technologies</div>
        </div>
    </div>

    <script id="quick_search_results_template" type="text/html">
<div>
    {{#resources}}
    <div class="search_result search_result_{{internal_name}}">
        <h4>{{name}}</h4>

        {{#count}}
        <table>
            <thead>
                <tr>
                    <th>ID</th>
                    <th></th>
                </tr>
            </thead>
            <tbody>
                {{#results}}
                <tr class="search_result_{{id}}">
                    <td><a href="{{url}}">{{id}}</a></td>
                    <td class="search_result_name"><a title="{{name}}" href="{{url}}">{{name}}</a></td>
                </tr>
                {{/results}}
            </tbody>
        </table>
        <a class='more_results' href="{{search_url}}">More results</a>
        {{/count}}
        {{^results}}
        <div class='no_results'>No matches found</div>
        {{/results}}
    </div>
    {{/resources}}
</div>
</script>

    <script type="text/html" id="warning_message_template">
    <div class="ui-widget" id="warning_message">
    <div class="ui-state-highlight ui-corner-all">
        <span class="ui-icon ui-icon-info"></span>
        <p>{{message}}</p>
    </div>
</div>

</script>


    <!-- notification template -->
    <div id="notifcation-container" style="display:none">
        <div id="basic-template">
            <a class="ui-notify-cross ui-notify-close" href="#">x</a>
            <h1>#{title}</h1>
            <p>#{text}</p>
        </div>
    </div>
</body>
</html>    

2 个答案:

答案 0 :(得分:0)

使用以下代码我找到id="configparse_port_list"参数没有问题:

require 'nokogiri'

doc = Nokogiri::HTML(<<EOT)
<!DOCTYPE html>
<html>
<head>
  <title>SIAM</title>
  <link href="/assets/application-49cce08127ac99204d4cb59e3bfaab8e.css" media="all" rel="stylesheet" type="text/css" />
  <script src="/assets/application-50259c7e8f6a002b7166ab714e68857b.js" type="text/javascript"></script>
  <script src="/assets/controllers/configparse_ports-925b92a6e41f7ffc3014e351d29291fc.js" type="text/javascript"></script>
  <meta content="authenticity_token" name="csrf-param" />
  <meta content="FFh3mbfqnLZhWclBmQ/kEeYSJPeQvapaC0tK9f4wWH8=" name="csrf-token" />
</head>

<body class="configparse_ports_index ctrl_configparse_ports" data-controller="configparse_ports" data-action="index">
<div id="header">
  <a href="https://siam-pro.qa.domain.com/"><img alt="domain_logo" src="/assets/domain_logo-0e44a80f1d9f1f9ce8fb7aa35dbc008b.png" /></a>
  <div>
    <div class="product_name">SIAM</div>
    <div class="version">v5.1</div>
  </div>

  <form accept-charset="UTF-8" action="/search/quick.json" class="ignoreDirty" data-remote="true" id="quick_search" method="post"><div style="margin:0;padding:0;display:inline    "><input name="utf8" type="hidden" value="&#x2713;" /><input name="authenticity_token" type="hidden" value="FFh3mbfqnLZhWclBmQ/kEeYSJPeQvapaC0tK9f4wWH8=" /></div>

    <input id="search_testcases" name="search[testcases]" type="hidden" value="true" />
    <input id="search_testplans" name="search[testplans]" type="hidden" value="true" />
    <input id="search_component_names" name="search[component_names]" type="hidden" value="true" />
    <input autocomplete="off" id="search_term" name="search[term]" placeholder="Search" type="text" />
  </form>
  <ul class="menu">
    <li><a href="https://siam-pro.qa.domain.com/">Home</a></li>
    <li><a href="/settings">Settings</a></li>
  </ul>
</div>

<div id="wrapper">
  <div id="content">
    <div id="loading">Loading ...</div>
    <div id="flash">


    </div>
    <div id="warning_message"></div>
    <h1>Listing Configparse Ports</h1>

    <div id="configparse_port_filters" class="filter_wrap">
      <h4>Filter &nbsp;</h4>
    </div>

    <table id="configparse_port_list">
      <thead>
        <tr>
          <th>ID #</th>
          <th>Name</th>
          <th>ANI Release</th>
          <th>Network Configuration</th>
          <th>State</th>
        </tr>
      </thead>

      <tbody>
      <tr>
        #MANY TRS - one of which I'm looking for based on the 3rd td (ANI Release)
      </tr>
      </tbody>
    </table>


  </div>
</div>

<div id="sidebar">
  <h3>Testcases</h3>
  <ul>
    <li><a href="/testcases/new">New</a></li>
    <li><a href="/search/testcase/new">Search</a></li>
    <li><a href="/search/bugzilla_cr/new">Import RTC</a></li>
  </ul>

  <h3>Testplans</h3>
  <ul>
    <li><a href="/testplans/new">New</a></li>
    <li><a href="/search/testplan/new">Search</a></li>
    <li><a href="/testplans">List Active</a></li>
  </ul>

  <h3>Use Cases</h3>
  <ul>
    <li><a href="/use_cases/new">New</a></li>
    <li><a href="/search/use_case/new">Search</a></li>
    <li><a href="/use_cases/manage">Manage</a></li>
  </ul>

  <h3>Configparse</h3>
  <ul>
    <li><a href="/configparse_ports/new">New</a></li>
    <li><a href="/configparse_ports">List Ports</a></li>
  </ul>

  <h3>Automation</h3>
  <ul>
    <li><a href="/automation_suites/new">New</a></li>
    <li><a href="/search/automation_suite/new">Search</a></li>
    <li><a href="/automation/status">Status</a></li>
  </ul>

</div>

<div id="footer">
  <div>
    <ul class="menu">
      <li><a href="mailto:siam-help@domain.com">Email SIAM Support</a></li>
      <li><a href="http://agora.domain.com/wiki/SIAM">SIAM WIKI</a></li>
    </ul>

    <div class="copyright">&copy; 2012 domain Technologies</div>
  </div>
</div>

<script id="quick_search_results_template" type="text/html">
<div>
        {{#resources}}
        <div class="search_result search_result_{{internal_name}}">
            <h4>{{name}}</h4>

            {{#count}}
            <table>
                <thead>
                    <tr>
                        <th>ID</th>
                        <th></th>
                    </tr>
                </thead>
                <tbody>
                    {{#results}}
                    <tr class="search_result_{{id}}">
                        <td><a href="{{url}}">{{id}}</a></td>
                        <td class="search_result_name"><a title="{{name}}" href="{{url}}">{{name}}</a></td>
                    </tr>
                    {{/results}}
                </tbody>
            </table>
            <a class='more_results' href="{{search_url}}">More results</a>
            {{/count}}
            {{^results}}
            <div class='no_results'>No matches found</div>
            {{/results}}
        </div>
        {{/resources}}
    </div>
</script>

<script type="text/html" id="warning_message_template">
<div class="ui-widget" id="warning_message">
        <div class="ui-state-highlight ui-corner-all">
            <span class="ui-icon ui-icon-info"></span>
            <p>{{message}}</p>
        </div>
    </div>

</script>


<!-- notification template -->
<div id="notifcation-container" style="display:none">
  <div id="basic-template">
    <a class="ui-notify-cross ui-notify-close" href="#">x</a>
    <h1>title</h1>
    <p>text</p>
  </div>
</div>
</body>
</html>    
EOT

运行之后,解析并准备好HTML:

configparse_port_list = doc.at('#configparse_port_list')
configparse_port_list.to_html
# => "<table id=\"configparse_port_list\">\n<thead><tr>\n<th>ID #</th>\n          <th>Name</th>\n          <th>ANI Release</th>\n          <th>Network Configuration</th>\n          <th>State</th>\n        </tr></thead>\n<tbody><tr>\n        #MANY TRS - one of which I'm looking for based on the 3rd td (ANI Release)\n      </tr></tbody>\n</table>"

我要小心做的一件事:

doc.css("#configparse_port_list")

是一个矛盾。 css用于返回满足特定条件的所有节点。 #configparse_port_list只能在文档中存在一次,因为它是一个ID。 Nokogiri很高兴为css返回一个单独的元素,但是对于那些没有注意的代码的其他人来说,这可能会令人困惑。我建议将其写为at("#configparse_port_list"),因为at将返回单个元素,使其与只有一个ID匹配的事实保持同步。

configparse_port_list = doc.css('#configparse_port_list').class # => Nokogiri::XML::NodeSet
configparse_port_list = doc.css('#configparse_port_list').size # => 1

这些工作也是如此,只需关注先前关于css和单个元素的警告:

doc.css('table[@id="configparse_port_list"]').size # => 1
doc.css('table#configparse_port_list').size # => 1

您可能想要检查您的Nokogiri和libXML2环境是否是最新的:

nokogiri -v

目前的Nokogiri是1.6.0

请注意,Nokogiri对该文件不满意:

doc.errors
# => [#<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>]

答案 1 :(得分:0)

我被困在pubcookie身份验证服务器后面。一旦我进行了身份验证,我就可以按照我最初尝试的方式访问html表(尽管在通过id获取节点时使用.at更好)。