无法使用Ruby中的Nokogiri来获取数据

时间:2015-07-17 10:18:56

标签: ruby-on-rails ruby web-scraping nokogiri

我目前正在尝试使用Nokogiri从网页上抓取数据。 我想从链接http://www.cardekho.com/Maruti/Noida/car-service-center.htm

中搜索服务中心列表的数据

我为此编写的代码是:

require 'open-uri'
require 'nokogiri'

doc = Nokogiri::HTML(open("http://www.cardekho.com/Maruti/Noida/car-service-center.htm"))

doc.css('.delrname').each do |node|
    puts node.text
end

我已经尝试了一堆CSS标签的组合,但没有一个能够提供所需的结果。是否有人建议使用此链接正确抓取服务中心列表数据的标签?

提前致谢

PS:当我在其他网站上测试时,相同的代码(带有适当的CSS标记)正在按预期工作,但它在本网站上无效。

2 个答案:

答案 0 :(得分:2)

您的代码似乎有效。我删除了网址中的空格:

doc = Nokogiri::HTML(open("http://www.cardekho.com/Maruti/Noida/car-service-center.htm"))

然后我尝试了,这是输出:

$ ruby file.rb                                                                                                                                              Fast Track Auto Care India
Jkm Motors
Mangalam Motors
Motorcraft India
Motorcraft India
Rohan Motors
Rohan Motors
Rohan Motors
Vipul Motors

答案 1 :(得分:0)

或者,您可以使用正则表达式获取更详细的结果...例如,使用:

/(<div class="delrname">([^<]*)<\/div><p>([^<]*)<\/p><div><div class="delermobcol "><div class="clearfix"><span class="mobico sprite"><\/span><div class="mobno">([^<]*)<\/div><\/div><div class="clear"><\/div><div class="viewsercntr"><a href="([^"]*)" title="View Car Dealers for Maruti in Noida">View Car Dealers for Maruti in Noida<\/a><\/div><\/div><div class="delermoilcol"><!----><div class="clearfix"><span class="mailico sprite"><\/span><div class="mobno"><a href="mailto:([^"]*)" target="_top">workshop.grn@rohanmotors.co.in<\/a><\/div>)/

您可以打破以下结果:

arrMatches = doc.scan(/(<div class="delrname">([^<]*)<\/div><p>([^<]*)<\/p><div><div class="delermobcol "><div class="clearfix"><span class="mobico sprite"><\/span><div class="mobno">([^<]*)<\/div><\/div><div class="clear"><\/div><div class="viewsercntr"><a href="([^"]*)" title="View Car Dealers for Maruti in Noida">View Car Dealers for Maruti in Noida<\/a><\/div><\/div><div class="delermoilcol"><!----><div class="clearfix"><span class="mailico sprite"><\/span><div class="mobno"><a href="mailto:([^"]*)" target="_top">workshop.grn@rohanmotors.co.in<\/a><\/div>)/)

arrMatches.each do |dealerInfo|
  thisEntireMatch = dealerInfo[0]
  thisName = dealerInfo[1]
  thisAddress = dealerInfo[2]
  thisMobile = dealerInfo[3]
  thisLink = dealerInfo[4]
  thisEmail = dealerInfo[5]
end