假设,我想从Web到我的应用程序获取一个页面,并使用它进行某种解析。我怎么做?我应该从哪里开始?应该是一些插件/宝石吗?你解决这类任务的惯常做法是什么?
答案 0 :(得分:7)
您应该尝试像Hpricot(wiki)或Nokogiri这样的宝石。
Hpricot示例:
require 'open-uri'
require 'rubygems'
require 'hpricot'
html = Hpricot(open(an_url).read)
# This would search for any images inside a paragraph (XPath)
html.search('/html/body//p//img')
# This would search for any images with the class "test" (CSS selector)
html.search('img.test')
Nokogiri的例子:
require 'open-uri'
require 'rubygems'
require 'hpricot'
html = Nokogiri::HTML(open(an_url).read)
# This would search for any images inside a paragraph (XPath)
html.xpath('/html/body//p//img')
# This would search for any images with the class "test" (CSS selector)
html.css('img.test')
Nokogiri通常更快。这两个库都具有很多功能。
答案 1 :(得分:0)