所以我循环遍历一个数组元素,这是返回的结果:
[nil, [#<Nokogiri::XML::Element:0x835386d4 name="a" attributes=[#<Nokogiri::XML::Attr:0x835385f8 name="href" value="http://bham.craigslist.org/web/2961573018.html">] children=[#<Nokogiri::XML::Text:0x835381c0 "Web Designer Full time">]>
我想要做的是访问href
值,然后访问text
值。我该怎么做?
我试过了:
puts i[:href]
但是这会产生这个错误:
TypeError: Symbol as array index
顺便说一句,我通过以下方式访问i
作为数组中的元素:
contents.each do |i|
puts i.inspect
puts i[:href]
end
修改1:
这就是我生成contents
数组的方法。没有必要重命名它,因为它可能会让人感到困惑:)
contents = {}
first_items.each do |link|
content_url = link
content_page = Nokogiri::HTML(open(content_url))
contents[link[:href]] = content_page.css("p a")
end
puts contents.inspect
这是输出的结果:
{nil=>[#<Nokogiri::XML::Element:0x85fee914 name="a" attributes=[#<Nokogiri::XML::Attr:0x85fee838 name="href" value="http://bham.craigslist.org/web/2961573018.html">] children=[#<Nokogiri::XML::Text:0x85fee400 "Web Designer Full time">]>, #<Nokogiri::XML::Element:0x85fee298 name="a" attributes=[#<Nokogiri::XML::Attr:0x85fee1bc name="href" value="http://bham.craigslist.org/web/2959813303.html">] children=[#<Nokogiri::XML::Text:0x85fedd84 "Once in a lifetime opportunity...">]>, #<Nokogiri::XML::Element:0x85fedc1c name="a" attributes=[#<Nokogiri::XML::Attr:0x85fedb40 name="href" value="http://bham.craigslist.org/web/2925485723.html">] children=[#<Nokogiri::XML::Text:0x85fed708 "Website Designer and Blogging Internship!">]>, #<Nokogiri::XML::Element:0x85fed5a0 name="a" attributes=[#<Nokogiri::XML::Attr:0x85fed4c4 name="href" value="http://bham.craigslist.org/web/2918424652.html">] children=[#<Nokogiri::XML::Text:0x85fed08c "Excellent Java Developer Opportunity!">]>, #<Nokogiri::XML::Element:0x85fecf24 name="a" attributes=[#<Nokogiri::XML::Attr:0x85fece48 name="href" value="http://bham.craigslist.org/web/2888669703.html">] children=[#<Nokogiri::XML::Text:0x85feca10 "Freelance Graphic Design">]>, #<Nokogiri::XML::Element:0x85fec8a8 name="a" attributes=[#<Nokogiri::XML::Attr:0x85fec7cc name="href" value="http://bham.craigslist.org/web/2900256461.html">] children=[#<Nokogiri::XML::Text:0x85fec394 "GWT/GXT Developer">]>, #<Nokogiri::XML::Element:0x85fec22c name="a" attributes=[#<Nokogiri::XML::Attr:0x85fec150 name="href" value="http://bham.craigslist.org/web/2897641463.html">] children=[#<Nokogiri::XML::Text:0x85febd18 "Website hiring!">]>]}
以下是i
的输出的完整值:
--------------------
This is the value of i:
[nil, [#<Nokogiri::XML::Element:0x85fee914 name="a" attributes=[#<Nokogiri::XML::Attr:0x85fee838 name="href" value="http://bham.craigslist.org/web/2961573018.html">] children=[#<Nokogiri::XML::Text:0x85fee400 "Web Designer Full time">]>, #<Nokogiri::XML::Element:0x85fee298 name="a" attributes=[#<Nokogiri::XML::Attr:0x85fee1bc name="href" value="http://bham.craigslist.org/web/2959813303.html">] children=[#<Nokogiri::XML::Text:0x85fedd84 "Once in a lifetime opportunity...">]>, #<Nokogiri::XML::Element:0x85fedc1c name="a" attributes=[#<Nokogiri::XML::Attr:0x85fedb40 name="href" value="http://bham.craigslist.org/web/2925485723.html">] children=[#<Nokogiri::XML::Text:0x85fed708 "Website Designer and Blogging Internship!">]>, #<Nokogiri::XML::Element:0x85fed5a0 name="a" attributes=[#<Nokogiri::XML::Attr:0x85fed4c4 name="href" value="http://bham.craigslist.org/web/2918424652.html">] children=[#<Nokogiri::XML::Text:0x85fed08c "Excellent Java Developer Opportunity!">]>, #<Nokogiri::XML::Element:0x85fecf24 name="a" attributes=[#<Nokogiri::XML::Attr:0x85fece48 name="href" value="http://bham.craigslist.org/web/2888669703.html">] children=[#<Nokogiri::XML::Text:0x85feca10 "Freelance Graphic Design">]>, #<Nokogiri::XML::Element:0x85fec8a8 name="a" attributes=[#<Nokogiri::XML::Attr:0x85fec7cc name="href" value="http://bham.craigslist.org/web/2900256461.html">] children=[#<Nokogiri::XML::Text:0x85fec394 "GWT/GXT Developer">]>, #<Nokogiri::XML::Element:0x85fec22c name="a" attributes=[#<Nokogiri::XML::Attr:0x85fec150 name="href" value="http://bham.craigslist.org/web/2897641463.html">] children=[#<Nokogiri::XML::Text:0x85febd18 "Website hiring!">]>]]
--------------------
This is the value of i.href:
编辑2:
顺便说一句,这就是实际HTML输出的样子......我这样做了:
builder = Nokogiri::HTML::Builder.new do |doc|
doc.html {
doc.body {
contents.each do |el|
if !el.nil?
puts "-" * 20
puts "This is the value of el: "
puts el.inspect
puts "-" * 20
puts "This is the value of el.href: "
puts el[:href]
end
doc.p {
doc.a el, :href => el
}
end
}
}
end
puts "*" * 50
puts "This is the HTML generated"
puts builder.to_html
它的外观如下:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p><a href="<a%20href=%22http://bham.craigslist.org/web/2961573018.html%22>Web%20Designer%20Full%20time</a><a%20href=%22http://bham.craigslist.org/web/2959813303.html%22>Once%20in%20a%20lifetime%20opportunity...</a><a%20href=%22http://bham.craigslist.org/web/2925485723.html%22>Website%20Designer%20and%20Blogging%20Internship!</a><a%20href=%22http://bham.craigslist.org/web/2918424652.html%22>Excellent%20Java%20Developer%20Opportunity!</a><a%20href=%22http://bham.craigslist.org/web/2888669703.html%22>Freelance%20Graphic%20Design</a><a%20href=%22http://bham.craigslist.org/web/2900256461.html%22>GWT/GXT%20Developer</a><a%20href=%22http://bham.craigslist.org/web/2897641463.html%22>Website%20hiring!</a>"><a href="http://bham.craigslist.org/web/2961573018.html">Web Designer Full time</a><a href="http://bham.craigslist.org/web/2959813303.html">Once in a lifetime opportunity...</a><a href="http://bham.craigslist.org/web/2925485723.html">Website Designer and Blogging Internship!</a><a href="http://bham.craigslist.org/web/2918424652.html">Excellent Java Developer Opportunity!</a><a href="http://bham.craigslist.org/web/2888669703.html">Freelance Graphic Design</a><a href="http://bham.craigslist.org/web/2900256461.html">GWT/GXT Developer</a><a href="http://bham.craigslist.org/web/2897641463.html">Website hiring!</a></a></p></body></html>
答案 0 :(得分:1)
我认为这可以简单得多。 Nokogiri已经解析了该文档并提供了访问内容的便捷方法。而不是循环,存储Nokogiri对象,然后尝试提取它们,为什么不尝试更直接的方法呢?
试试这段代码:
content_page.search(//a[@href]).map{ |el| [el[:href], el.text] }
这将创建包含文档中每个链接的文本和href的二维数组,这就是您在后续评论中所说的,您实际上正在努力。
答案 1 :(得分:0)
也许这样,因为你的阵列中有一个奇怪的零。
contents.each do |i|
if !i.nil?
puts i.inspect
puts i[:href]
end
end
Edit1 :实际上我认为你只需要做contents = contents[1]
。
contents = contents[1]
contents.each do |i|
puts i.inspect
puts i[:href]
end
答案 2 :(得分:0)
您可以使用compact删除nils:
nodes.compact.each do |node|
puts node[:href], node.text
end