我有这样的事情:
<div id="sub_div">
<span class="subl">
<div class="node">2204830011</span>
<div class="node">1571827122</span>
<div class="node">...</span>
<div class="node">...</span>
<div class="node">...</span>
</span>
<span class="subl">
<div class="node">...</span>
<div class="node">...</span>
<div class="node">...</span>
<div class="node">...</span>
<div class="node">...</span>
</span>
<span class="subl">
<div class="node">...</span>
<div class="node">...</span>
<div class="node">...</span>
</span>
现在,我正在这样做:
def self.parse_nodes
id = @data.at_css("#n_info #clipnode").text unless @data.at_css("#n_info #clipnode").nil?
name = @data.at_css("#n_info .node_name").text unless @data.at_css("#n_info .node_name").nil?
parent = @data.at_css(".bc a").text unless @data.at_css(".bc a").nil?
children_array = []
children = @data.css('#sub_div')
children.css('.subl').each do | child |
child_id = child.css('.node').text[/[\d,]+/].to_i
children_array ||= []
children_array << child_id
end
nodes_hash = "id: #{id}, name: #{name}, parent: #{parent}, children: #{children_array}"
nodes_hash
end
我得到的是这样的东西:
[220483001115718271223064201115857511158575013463330111571879115709231157103512258019011157197311570657115706941,
220483001115718271223064201115857511158575013463330111571879115709231157103512258019011157197311570657115706941,
220483001115718271223064201115857511158575013463330111571879115709231157103512258019011157197311570657115706941]
我不知道为什么代码会呈现所有.node
三次。但无论如何,我想要做的是废弃每个.node
div的.subl
内的内容并将它们呈现为数组:
[2204830011, 1571827122, 3064201115, 8575111585, 7501346333,
0111571879, 1157092311, 5710351225, 8019011157, 1973115706,
57115706941]
答案 0 :(得分:1)
请尝试以下操作:
children = @data.css('#sub_div')
children_array = children.css('.subl .node').map { |node| node.text.to_i }
OR
children = @data.css('#sub_div')
children_array = children.css('.subl .node').map(&:text).map(&:to_i)
答案 1 :(得分:1)
您的代码产生以下输出:
require 'nokogiri'
html =<<END_OF_HTML
<div id="sub_div">
<span class="subl">
<div class="node">2204830011</div>
<div class="node">1571827122</div>
<div class="node">...</div>
<div class="node">...</div>
<div class="node">...</div>
</span>
<span class="subl">
<div class="node">...</div>
<div class="node">...</div>
<div class="node">...</div>
<div class="node">...</div>
<div class="node">...</div>
</span>
<span class="subl">
<div class="node">1</div>
<div class="node">...</div>
<div class="node">...</div>
</span>
</div>
END_OF_HTML
doc = Nokogiri::HTML(html)
children_array = []
children = doc.css('#sub_div')
children.css('.subl').each do | child |
child_id = child.css('.node').text[/[\d,]+/].to_i
children_array ||= []
children_array << child_id
end
p children_array
--output:--
[22048300111571827122, 0, 1]
你将数字连接在一起的原因是因为你写的时候:
child.css('.node')
...你得到一个NodeSet,它包含所有带有class =“node”的div。 text()方法从NodeSet中提取所有文本节点,并将所有文本连接在一起,没有空格:
require 'nokogiri'
html = "<div><span>hello</span><span>world</span></div>"
doc = Nokogiri::HTML(html)
spans = doc.css("span")
puts spans.text
--output:--
helloworld
所以当你写:
child.css('.node').text
...你会将许多数字连接成一个字符串。
以下是您可以做的事情:
require 'nokogiri'
html =<<END_OF_HTML
<div id="sub_div">
<span class="subl">
<div class="node">2204830011</div>
<div class="node">1571827122</div>
<div class="node">...</div>
<div class="node">...</div>
<div class="node">...</div>
</span>
<span class="subl">
<div class="node">...</div>
<div class="node">...</div>
<div class="node">...</div>
<div class="node">...</div>
<div class="node">...</div>
</span>
<span class="subl">
<div class="node">3333333</div>
<div class="node">...</div>
<div class="node">...</div>
</span>
</div>
END_OF_HTML
doc = Nokogiri::HTML(html)
results = []
doc.css("#sub_div span.subl div.node").each do |div|
if num = div.text[/[\d,]+/]
results << num.to_i
end
end
p results
--output:--
[2204830011, 1571827122, 3333333]
答案 2 :(得分:0)
这是另一种方法: -
require 'nokogiri'
doc = Nokogiri::HTML::Document.parse <<-eotl
<div id="sub_div">
<span class="subl">
<div class="node">2204830011</div>
<div class="node">1571827122</div>
<div class="node">...</div>
<div class="node">...</div>
<div class="node">...</div>
</span>
<span class="subl">
<div class="node">...</div>
<div class="node">...</div>
<div class="node">...</div>
<div class="node">...</div>
<div class="node">...</div>
</span>
<span class="subl">
<div class="node">3333333</div>
<div class="node">...</div>
<div class="node">...</div>
</span>
</div>
eotl
doc.xpath("//div[@id='sub_div']//div[@class='node'][boolean(number()) or . = 0]").map{|n| n.text.to_i}
# => [2204830011, 1571827122, 3333333]