给出一个html文件:
<div>
<div class="NormalMid">
<span class="style-span">
"Data 1:"
<a href="http://site.com/data/1">1</a>
<a href="http://site.com/data/2">2</a>
</span>
</div>
...more divs
<div class="NormalMid">
<span class="style-span">
"Data 20:"
<a href="http://site.com/data/20">20</a>
<a href="http://site.com/data/21">21</a>
<a href="http://site.com/data/22">22</a>
<a href="http://site.com/data/23">23</a>
</span>
</div>
...more divs
</div
使用这些SO帖子作为参考: How do I integrate these two conditions block codes to mine in Ruby? 和 How to understand this Arrays and loops in Ruby?
我的代码:
require 'nokogiri'
require 'pp'
require 'open-uri'
data_file = 'site.htm'
file = File.open(data_file, 'r')
html = open(file)
page = Nokogiri::HTML(html)
page.encoding = 'utf-8'
rows = page.xpath('//div[@class="NormalMid"]')
details = rows.collect do |row|
detail = {}
[
[row.children.first.element_children,row.children.first.element_children],
].each do |part, link|
data = row.children[0].children[0].to_s.strip
links = link.collect {|item| item.at_xpath('@href').to_s.strip}
detail[data.to_sym] = links
end
detail
end
details.reject! {|d| d.empty?}
pp details
输出:
[{:"Data 1:"=>
["http://www.site.com/data/1",
"http://www.site.com/data/2"]},
...
{:"Data 20 :"=>
["http://www.site.com/data/20",
"http://www.site.com/data/21",
"http://www.site.com/data/22",
"http://www.site.com/data/20",]},
...
}]
一切都很顺利,正是我想要的。
但如果您更改以下代码行:
detail = {}
[
[row.children.first.element_children,row.children.first.element_children],
].each do |part, link|
为:
detail = {}
[
[row.children.first.element_children],
].each do |link|
我得到了
的输出[{:"Data 1:"=>
["http://www.site.com/data/1"]},
...
{:"Data 20 :"=>
["http://www.site.com/data/20"]},
...
}]
只有第一个锚点href存储在数组中。
我只需要澄清为什么它的表现方式,因为参数列表中的参数part
没有被使用,我想我不需要它。但是如果我删除相应的row.children.first.element_children
,我的程序也无法正常工作。
[[obj,obj],].each do
区块发生了什么?我刚刚开始ruby一周前,我仍然习惯了语法,任何帮助将不胜感激。谢谢你:D
修改
rows[0].children.first.element_children[0]
将有输出
Nokogiri::XML::Element:0xcea69c name="a" attributes=[#<Nokogiri::XML::Attr:0xcea648
name="href" value="http://www.site.com/data/1">] children[<Nokogiri::XML::Text:0xcea1a4
"1">]>
puts rows[0].children.first.element_children[0]
<a href="http://www.site.com/data/1">1</a>
答案 0 :(得分:1)
您的代码过于复杂。看看你的代码,你似乎想要得到类似下面的内容:
require 'nokogiri'
doc = Nokogiri::HTML::Document.parse <<-eotl
<div>
<div class="NormalMid">
<span class="style-span">
"Data 1:"
<a href="http://site.com/data/1">1</a>
<a href="http://site.com/data/2">2</a>
</span>
</div>
<div class="NormalMid">
<span class="style-span">
"Data 20:"
<a href="http://site.com/data/20">20</a>
<a href="http://site.com/data/21">21</a>
<a href="http://site.com/data/22">22</a>
<a href="http://site.com/data/23">23</a>
</span>
</div>
</div
eotl
rows = doc.xpath("//div[@class='NormalMid']/span[@class='style-span']")
val = rows.map do |row|
[row.at_xpath("./text()").to_s.tr('"','').strip,row.xpath(".//@href").map(&:to_s)]
end
Hash[val]
# => {"Data 1:"=>["http://site.com/data/1", "http://site.com/data/2"],
# "Data 20:"=>
# ["http://site.com/data/20",
# "http://site.com/data/21",
# "http://site.com/data/22",
# "http://site.com/data/23"]}
[[obj,obj],]中发生了什么。每个都阻止?
看下面两部分:
[[1],[4,5]].each do |a|
p a
end
# >> [1]
# >> [4, 5]
[[1,2],[4,5]].each do |a,b|
p a, b
end
# >> 1
# >> 2
# >> 4
# >> 5