我正在使用this dashing widget中的以下ruby脚本来检索RSS提要并解析它并将解析后的标题和描述发送到小部件。
require 'net/http'
require 'uri'
require 'nokogiri'
require 'htmlentities'
news_feeds = {
"seattle-times" => "http://seattletimes.com/rss/home.xml",
}
Decoder = HTMLEntities.new
class News
def initialize(widget_id, feed)
@widget_id = widget_id
# pick apart feed into domain and path
uri = URI.parse(feed)
@path = uri.path
@http = Net::HTTP.new(uri.host)
end
def widget_id()
@widget_id
end
def latest_headlines()
response = @http.request(Net::HTTP::Get.new(@path))
doc = Nokogiri::XML(response.body)
news_headlines = [];
doc.xpath('//channel/item').each do |news_item|
title = clean_html( news_item.xpath('title').text )
summary = clean_html( news_item.xpath('description').text )
news_headlines.push({ title: title, description: summary })
end
news_headlines
end
def clean_html( html )
html = html.gsub(/<\/?[^>]*>/, "")
html = Decoder.decode( html )
return html
end
end
@News = []
news_feeds.each do |widget_id, feed|
begin
@News.push(News.new(widget_id, feed))
rescue Exception => e
puts e.to_s
end
end
SCHEDULER.every '60m', :first_in => 0 do |job|
@News.each do |news|
headlines = news.latest_headlines()
send_event(news.widget_id, { :headlines => headlines })
end
end
示例rss feed正常工作,因为该URL用于xml文件。但是,我想将此用于不提供实际xml文件的其他RSS源。我想要的这个RSS Feed是http://www.ttc.ca/RSS/Service_Alerts/index.rss 这似乎没有在小部件上显示任何内容。我没有使用“http://www.ttc.ca/RSS/Service_Alerts/index.rss”,而是尝试了“http://www.ttc.ca/RSS/Service_Alerts/index.rss?format=xml”和“view-source:http://www.ttc.ca/RSS/Service_Alerts/index.rss”,但没有运气。有谁知道我如何获得与这个rss feed相关的实际xml数据,以便我可以将它与这个ruby脚本一起使用?
答案 0 :(得分:2)
你是对的,该链接不提供常规XML,因此该脚本在解析它时不起作用,因为它是专门用于解析示例XML的。你试图解析的rss feed是提供RDF XML的,你可以使用Rubygem:RDFXML来解析它。
类似的东西:
require 'nokogiri'
require 'rdf/rdfxml'
rss_feed = 'http://www.ttc.ca/RSS/Service_Alerts/index.rss'
RDF::RDFXML::Reader.open(rss_feed) do |reader|
# use reader to iterate over elements within the document
end
从这里,您可以尝试学习如何使用RDFXML来提取您想要的内容。我首先检查读者对象我可以使用的方法:
puts reader.methods.sort - Object.methods
这将打印出读者自己的方法,寻找可能用于您的目的的方法,例如reader.each_entry
要进一步挖掘,您可以检查每个条目的样子:
reader.each_entry do |entry|
puts "----here's an entry----"
puts entry.inspect
end
或查看您可以在条目上调用的方法:
reader.each_entry do |entry|
puts "----here's an entry's methods----"
puts entry.methods.sort - Object.methods
break
end
我能够使用这个黑客工作粗略地找到一些标题和描述:
RDF::RDFXML::Reader.open('http://www.ttc.ca/RSS/Service_Alerts/index.rss') do |reader|
reader.each_object do |object|
puts object.to_s if object.is_a? RDF::Literal
end
end
# returns:
# TTC Service Alerts
# http://www.ttc.ca/Service_Advisories/index.jsp
# TTC Service Alerts.
# TTC.ca
# http://www.ttc.ca
# http://www.ttc.ca/images/ttc-main-logo.gif
# Service Advisory
# http://www.ttc.ca/Service_Advisories/all_service_alerts.jsp#Service+Advisory
# 196 York University Rocket route diverting northbound via Sentinel, Finch due to a collision that has closed the York U Bus way.
# - Affecting: Bus Routes: 196 York University Rocket
# 2013-12-17T13:49:03.800-05:00
# Service Advisory (2)
# http://www.ttc.ca/Service_Advisories/all_service_alerts.jsp#Service+Advisory+(2)
# 107B Keele North route diverting northbound via Keele, Lepage due to a collision that has closed the York U Bus way.
# - Affecting: Bus Routes: 107 Keele North
# 2013-12-17T13:51:08.347-05:00
但我无法快速找到一种方法来了解哪一个是标题,以及哪个描述:/
最后,如果您仍然无法找到如何提取所需内容,请使用此信息开始一个新问题。
祝你好运!