我正在尝试搜索有关网站新专辑发布的信息,我正在通过Nokogiri处理这个问题。我们的想法是创建一个包含类似项目的漂亮数组
[
0 => ['The Wall', 'Pink Floyd', '1979'],
1 => ['Led Zeppelin I', 'Led Zeppelin', '1969']
]
这是我目前的代码。我是一个完全红宝石的新手,所以任何建议都会非常感激。
@events = Array.new()
# for every date we encounter
doc.css("#main .head_type_1").each do |item|
date = item.text
# get every albumtitle
doc.css(".albumTitle").each_with_index do |album, index|
album = album.text
@events[index]['album'] = album
@events[index]['release_date'] = date
end
#get every artistname
doc.css(".artistName").each do |artist|
artist = artist.text
@events[index]['artist'] = artist
end
end
puts @events
P.S。我试图抓取的页面格式有点奇怪:
<tr><th class="head_type_1">20 October 1989</th></tr>
<tr><td class="artistName">Jean Luc-Ponty</td><td class="albumTitle">Some example album</td></tr>
<tr><td class="artistName">Some Other Artist</td><td class="albumTitle">Some example album</td></tr>
<tr><td class="artistName">Some Other Artist</td><td class="albumTitle">Some example album</td></tr>
<tr><th class="head_type_1">29 October 1989</th></tr>
<tr><td class="artistName">Some Other Artist</td><td class="albumTitle">Some example album</td></tr>
当我尝试在ruby解释器中运行它时,我得到以下错误:
get_events.rb:25:in `block (2 levels) in <main>': undefined method `[]=' for nil:NilClass (NoMethodError)
from /Users/adrian/.rvm/gems/ruby-1.9.3-p286/gems/nokogiri-1.5.5/lib/nokogiri/xml/node_set.rb:239:in `block in each'
from /Users/adrian/.rvm/gems/ruby-1.9.3-p286/gems/nokogiri-1.5.5/lib/nokogiri/xml/node_set.rb:238:in `upto'
from /Users/adrian/.rvm/gems/ruby-1.9.3-p286/gems/nokogiri-1.5.5/lib/nokogiri/xml/node_set.rb:238:in `each'
from get_events.rb:23:in `each_with_index'
from get_events.rb:23:in `block in <main>'
from /Users/adrian/.rvm/gems/ruby-1.9.3-p286/gems/nokogiri-1.5.5/lib/nokogiri/xml/node_set.rb:239:in `block in each'
from /Users/adrian/.rvm/gems/ruby-1.9.3-p286/gems/nokogiri-1.5.5/lib/nokogiri/xml/node_set.rb:238:in `upto'
from /Users/adrian/.rvm/gems/ruby-1.9.3-p286/gems/nokogiri-1.5.5/lib/nokogiri/xml/node_set.rb:238:in `each'
from get_events.rb:18:in `<main>'
我该如何解决这个问题?
答案 0 :(得分:1)
我无法绕过你的解决方案,但在玩了一下之后,我想出了这个。
require 'pp'
require 'nokogiri'
str = %Q{
<tr><th class="head_type_1">20 October 1989</th></tr>
<tr><td class="artistName">Jean Luc-Ponty</td><td class="albumTitle">Some album</td></tr>
<tr><td class="artistName">Some Other Artist</td><td class="albumTitle">Some album</td></tr>
<tr><td class="artistName">Some Other Artist</td><td class="albumTitle">Some album</td></tr>
<tr><th class="head_type_1">29 October 1989</th></tr>
<tr><td class="artistName">Some Other Artist</td><td class="albumTitle">Some album</td></tr>
}
doc = Nokogiri::HTML(str)
date = ""
result = []
doc.xpath("//tr").each do |tr|
children = tr.children
if children.first["class"] == "head_type_1"
date = children.first.content
else
artist, album = children.map {|c| c.content}
result << {album: album, artist: artist, date: date}
end
end
pp result
输出:
[{:album=>"Some album", :artist=>"Jean Luc-Ponty", :date=>"20 October 1989"},
{:album=>"Some album", :artist=>"Some Other Artist", :date=>"20 October 1989"},
{:album=>"Some album", :artist=>"Some Other Artist", :date=>"20 October 1989"},
{:album=>"Some album", :artist=>"Some Other Artist", :date=>"29 October 1989"}]
不完全是你要求的,但也许更多的Ruby惯用,我相信如果需要你可以修改它。
答案 1 :(得分:-1)
索引变量在您的第二个each
上未定义。