如何使用Rails和Nokogiri找到直接的孩子,而不是嵌套的孩子?

时间:2016-11-30 17:16:54

标签: css ruby-on-rails ruby css-selectors nokogiri

我正在使用Rails 4.2.7与Ruby(2.3)和Nokogiri。如何找到表中最直接的tr子项,而不是嵌套的子项?目前我在表格中找到表格行...

  tables = doc.css('table')
  tables.each do |table|
    rows = table.css('tr')

这不仅可以找到表的直接行,例如

<table>
    <tbody>
        <tr>…</tr>

但它也会在行中找到行,例如

<table>
    <tbody>
        <tr>
            <td>
                <table>
                    <tr>This is found</tr>
                </table>
            </td>
        </tr>

如何优化搜索以仅查找直接tr元素?

2 个答案:

答案 0 :(得分:0)

我不知道是否可以直接用css / xpath完成,所以我写了一个小方法,以递归方式查找节点。它会在找到后立即停止递归。

xml= %q{
<root>
  <table>
    <tbody>
      <tr nested="false">
        <td>
          <table>
            <tr nested="true">
              This is found</tr>
          </table>
        </td>
      </tr>
    </tbody>
  </table>
  <another_table>
    <tr nested = "false">
      <tr nested = "true">
    </tr>
  </another_table>
  <tr nested = "false"/>
</root>
}

require 'nokogiri'

doc = Nokogiri::XML.parse(xml)

class Nokogiri::XML::Node
  def first_children_found(desired_node)
    if name == desired_node
      [self]
    else
      element_children.map{|child|
        child.first_children_found(desired_node)
      }.flatten
    end
  end
end

doc.first_children_found('tr').each do |tr|
  puts tr["nested"]
end

#=>
# false
# false
# false

答案 1 :(得分:0)

您可以使用XPath在几个步骤中完成此操作。首先,您需要找到table的“级别”(即它在其他表中的嵌套方式),然后查找具有相同tr个祖先数的所有后代table:< / p>

tables = doc.xpath('//table')
tables.each do |table|
  level = table.xpath('count(ancestor-or-self::table)')
  rows = table.xpath(".//tr[count(ancestor::table) = #{level}]")
  # do what you want with rows...
end

在更一般的情况下,您可能tr直接嵌套其他tr,您可以执行以下操作(这可能是无效的HTML,但您可能有XML或其他一些标记):

tables.each do |table|
  # Find the first descendant tr, and determine its level. This
  # will be a "top-level" tr for this table. "level" here means how
  # many tr elements (including itself) are between it and the
  # document root.
  level = table.xpath("count(descendant::tr[1]/ancestor-or-self::tr)")
  # Now find all descendant trs that have that same level. Since
  # the table itself is at a fixed level, this means all these nodes
  # will be "top-level" rows for this table.
  rows = table.xpath(".//tr[count(ancestor-or-self::tr) = #{level}]")
  # handle rows...
end

第一步可以分为两个单独的查询,这可能更清楚:

first_tr = table.at_xpath(".//tr")
level = first_tr.xpath("count(ancestor-or-self::tr)")

(如果有一个表没有tr,这将失败,因为first_tr将是nil。上面的组合XPath可以正确处理这种情况。)