这是我的代码:
doc= Nokogiri::HTML(open("http://www.cincinnatisun.com/index.php?rss/90d24f4ad98a2793", 'User-Agent' => 'ruby'))
search=doc.css('item')
if !search.blank?
search.each do |data|
title=data.css("title").text
link=data.css("link").text
end
end
但我没有得到链接。
答案 0 :(得分:0)
根据http://nokogiri.org/tutorials/searching_a_xml_html_document.html之类的内容:
@doc = Nokogiri::XML(File.read("feed.xml"))
@doc.xpath('//xmlns:link')
应该做的工作。但请注意,您提供的xml片段根本不是有效的xml源(没有根元素,项目标记未打开 - 仅关闭等)。该代码假定xml feed看起来就是这样。
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<item>
<title>Atom-Powered Robots Run Amok</title>
<link>http://example.org/2003/12/13/atom03</link>
</item>
</feed>
提取物:
<link>http://example.org/2003/12/13/atom03</link>
结果。如果您遇到这样的问题,请先尝试查看文档/参考资料。如果您尝试了一些并且它没有像您期望的那样工作,那么您可以使用实际代码参考stackoverflow - 这样可以更容易地理解您的问题&amp;提供帮助。
答案 1 :(得分:0)
有些事情是错的:
if !search.blank?
将无效,因为search
将是doc.css
返回的NodeSet。 NodeSet
没有blank?
方法。也许你的意思是empty?
?
title=data.css("title").text
不是找到title
的正确方法,因为就像上面的问题一样,你得到的是NodeSet而不是Node。从NodeSet获取text
可能会返回大量您不想要的垃圾。而是做:
title=data.at("title").text
将代码更改为:
require 'nokogiri'
require 'open-uri'
doc= Nokogiri::HTML(open("http://www.cincinnatisun.com/index.php?rss/90d24f4ad98a2793", 'User-Agent' => 'ruby'))
search=doc.css('item')
if !search.empty?
search.each do |data|
title=data.at("title").text
link=data.at("link").text
puts "title: #{ title } link: #{ link }"
end
end
输出:
title: Ex-Bengals cheerleaders lawsuit trial to begin link: title: Freedom Center Offering Free Admission Monday link: title: Miami University Band Performing in the Inaugural Parade link: title: Northern Kentucky Man To Present Colors At Inauguration link: title: John Gumms Monday Forecast link: title: President Obama VP Biden sworn in officially begin second terms link: title: Colerain Township Pizza Hut Robbed Saturday Night link: title: Cold Snap Coming to Tri-State link: title: 2 Men Arrested After Police Chase in Northern Kentucky link:
link
无效,因为XML格式不正确,根据我的经验,这种情况在互联网上难以置信,因为人们不会花时间检查他们的工作。
修复将在Nokogiri接收内容之前对XML进行按摩,或者修改您的访问者。幸运的是,这个特定的XML很容易解决,所以这应该有所帮助:
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML(open("http://www.cincinnatisun.com/index.php?rss/90d24f4ad98a2793", 'User-Agent' => 'ruby'))
search = doc.css('item')
if !search.empty?
search.each do |data|
title = data.at("title").text
link = data.at("link").next_sibling.text
puts "title: #{ title } link: #{ link }"
end
end
哪个输出:
title: Ex-Bengals cheerleaders lawsuit trial to begin link: http://www.cincinnatisun.com/index.php/sid/212072454/scat/90d24f4ad98a2793 title: Freedom Center Offering Free Admission Monday link: http://www.cincinnatisun.com/index.php/sid/212072914/scat/90d24f4ad98a2793 title: Miami University Band Performing in the Inaugural Parade link: http://www.cincinnatisun.com/index.php/sid/212072915/scat/90d24f4ad98a2793 title: Northern Kentucky Man To Present Colors At Inauguration link: http://www.cincinnatisun.com/index.php/sid/212072913/scat/90d24f4ad98a2793 title: John Gumms Monday Forecast link: http://www.cincinnatisun.com/index.php/sid/212070535/scat/90d24f4ad98a2793 title: President Obama VP Biden sworn in officially begin second terms link: http://www.cincinnatisun.com/index.php/sid/212060033/scat/90d24f4ad98a2793 title: Colerain Township Pizza Hut Robbed Saturday Night link: http://www.cincinnatisun.com/index.php/sid/212057132/scat/90d24f4ad98a2793 title: Cold Snap Coming to Tri-State link: http://www.cincinnatisun.com/index.php/sid/212057131/scat/90d24f4ad98a2793 title: 2 Men Arrested After Police Chase in Northern Kentucky link: http://www.cincinnatisun.com/index.php/sid/212057130/scat/90d24f4ad98a2793
完成所有这些后,您可以更清楚地编写代码,如:
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML(open("http://www.cincinnatisun.com/index.php?rss/90d24f4ad98a2793", 'User-Agent' => 'ruby'))
doc.css('item').each do |data|
title = data.at("title").text
link = data.at("link").next_sibling.text
puts "title: #{ title } link: #{ link }"
end
有趣的是,现在示例页面似乎已修复其链接。