数据抓取多个数组创建和排序

时间:2016-06-07 13:01:05

标签: ruby-on-rails ruby web-scraping nokogiri mechanize

我们正在努力削减课程名称,资格和课程持续时间,并将每个课程存储在一个单独的数组中。下面我们提取所有这些,但它似乎是随机顺序,有些部分可能按页面排序等。想知道是否有人能够提供帮助。

require 'mechanize'


mechanize = Mechanize.new
@duration_array = []
@qual_array = []
@courses_array = []

page = mechanize.get('http://search.ucas.com/search/results?Vac=2&AvailableIn=2016&IsFeatherProcessed=True&page=1&providerids=41')


page.search('div.courseinfoduration').each do |x|
puts x.text.strip
page.search('div.courseinfooutcome').each do |y|
puts y.text.strip

end

while next_page_link = page.at('.pager a[text()=">"]')
  page = mechanize.get(next_page_link['href'])

page.search('div.courseinfoduration').each do |x|
    name = x
    @duration_array.push(name)
    puts x.text.strip
  end
end
while next_page_link = page.at('.pager a[text()=">"]')
  page = mechanize.get(next_page_link['href'])

page.search('div.courseinfooutcome').each do |y|
    name = y
    @qual_array.push(name)
    puts y.text.strip
  end
end
page.search('div.coursenamearea h4').each do |h4|
puts h4.text.strip

end

while next_page_link = page.at('.pager a[text()=">"]')
  page = mechanize.get(next_page_link['href'])

page.search('div.coursenamearea h4').each do |h4|
    name = h4.text
    @courses_array.push(name)
    puts h4.text.strip
  end
end
end

0 个答案:

没有答案