我一直在努力解析来自saferweb网站的一些信息,并且遇到了让它运行起来的问题。
如果我能得到第一个值,我可以调整它来得到其余的......
此示例应在实体类型
旁边返回Carrier
来源:
http://safer.fmcsa.dot.gov/query.asp?searchtype=ANY&query_type=queryCarrierSnapshot&query_param=MC_MX&query_string=733709
机械化w / hpricot
require 'rubygems'
require 'mechanize'
require 'hpricot'
agent = Mechanize.new
page = agent.get('http://safer.fmcsa.dot.gov/query.asp?searchtype=ANY&query_type=queryCarrierSnapshot&query_param=MC_MX&query_string=733709')
@response = page.content
doc = Hpricot(@response)
a = (doc/"/html/body/p/table/tbody/tr[2]/td/table/tbody/tr[2]/td/center[1]/table/tbody/tr[2]/td")[0].innerHTML
a
引入nokogiri
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML(open("http://safer.fmcsa.dot.gov/query.asp?searchtype=ANY&query_type=queryCarrierSnapshot&query_param=MC_MX&query_string=733709"))
ebit = doc.at("/html/body/p/table/tbody/tr[2]/td/table/tbody/tr[2]/td/center[1]/table/tbody/tr[2]/td").text
puts ebit
答案 0 :(得分:2)
看起来值列都具有相同的CSS类,因此使用它可能更容易搜索。这对我有用。
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML(open("http://safer.fmcsa.dot.gov/query.asp?searchtype=ANY&query_type=queryCarrierSnapshot&query_param=MC_MX&query_string=733709"))
# Get Entity Type field
ebit = doc.at('.queryfield').text
# Get rid of all the white space
ebit.gsub!("\u00A0", "").strip!
puts ebit