我有以下HTML代码,需要使用<span>
ID确定“字符串数”的索引。我正在使用Nokogiri来解析HTML并得到行。
doc = Nokogiri::parse(myfile.html)
table = doc.xpath("//span[@id='NumStrs']/../../..")
row = table.xpath["tr[1]"]
这是HTML:
<tr>
<th id ="langframe">
<span id="cabinet">
Cabinet</span>
</th>
<th id ="langbb1">
<span id="bb1">
BB1</span>
</th>
<th id ="langbb2">
<span id="bb2">
BB2</span>
</th>
<th id ="langtemp">
<span id="Temp">
Temperature</span>
</th>
<th id="langstrs">
<span id="StringsPresent">
Strings Present</span>
</th>
<th id="langmstrQty">
<span id="NumStrs">
Number of Strings</span>
</th>
</tr>
答案 0 :(得分:2)
我是使用Ruby with_index
结合select
来实现的:
require 'nokogiri' # => true
doc = Nokogiri::HTML(<<EOT)
<tr>
<th id ="langframe">
<span id="cabinet">
Cabinet</span>
</th>
<th id ="langbb1">
<span id="bb1">
BB1</span>
</th>
<th id ="langbb2">
<span id="bb2">
BB2</span>
</th>
<th id ="langtemp">
<span id="Temp">
Temperature</span>
</th>
<th id="langstrs">
<span id="StringsPresent">
Strings Present</span>
</th>
<th id="langmstrQty">
<span id="NumStrs">
Number of Strings</span>
</th>
</tr>
EOT
th_idx = doc.search('th').to_enum.with_index.select { |th, idx| th.text['Number of Strings'] }.first
返回:
th_idx
# => [#(Element:0x3fe72d83cd3c {
# name = "th",
# attributes = [
# #(Attr:0x3fe72d4440f4 { name = "id", value = "langmstrQty" })],
# children = [
# #(Text "\n"),
# #(Element:0x3fe72d43c3e0 {
# name = "span",
# attributes = [
# #(Attr:0x3fe72d439b04 { name = "id", value = "NumStrs" })],
# children = [ #(Text "\nNumber of Strings")]
# }),
# #(Text "\n")]
# }),
# 5]
索引是:
th_idx.last # => 5
获得th_idx
后,您可以轻松访问父节点或子节点,以了解其周围环境:
th_node = th_idx.first
th_node['id'] # => "langmstrQty"
th_node.at('span')
# => #(Element:0x3fd5110286d8 {
# name = "span",
# attributes = [
# #(Attr:0x3fd511021b6c { name = "id", value = "NumStrs" })],
# children = [ #(Text "\nNumber of Strings")]
# })
th_node.at('span')['id'] # => "NumStrs"
with_index
为传递给它的每个元素添加一个从0开始的索引。 to_enum
是必需的,因为search
会返回一个NodeSet,它不是枚举器,因此to_enum
会返回该值。
如果您希望基于1的索引使用with_index(1)
。
答案 1 :(得分:1)
让它工作,不确定这是否是有效的方法..但它的工作原理
header = table.xpath("tr[1]")
value = header.xpath("//span[@id='#{id}']").text
index = header.search('th//text()').collect {|text| text.to_s.strip}.reject(&:empty?).index(value)+1