屏幕报废与nokogiri

时间:2016-10-17 05:03:54

标签: ruby-on-rails

我是一个完整的堆栈ruby开发人员。我正在尝试从网站上删除数据并且我能够成功获取数据。但问题是下次当我获取数据时我只想获取新数据我不想覆盖数据库中的所有数据。 我只想添加最近添加的新记录。但我无法找到任何解决方案,如何使用最少的查询和最少的代码。

这是我用于报废的代码:

   client = Mechanize.new
         index_page = client.get('https://www.google.com/')
         document_page_index = Nokogiri::HTML::Document.parse(index_page.body)
         page_no_merchant = document_page_index.css('.pagination.pagination-centered ul li:nth-last-child(2) a').text.to_i
         1.upto(page_no_merchant) do |page_number|
             client.get("https://www.google.com/buy-gift-cards?page=#{page_number}") do |page|
                 document = Nokogiri::HTML::Document.parse(page.body)


                 document.css('.product-source').each do |item|
                    merchant_name= item.children.css('.name').text.gsub("Gift Cards", "")
                     puts merchant_name
                     href = item.css('a').first.attr('href')
                     puts href
                     image_url=item.children.css('.img img').attr('data-src').text.strip
                     puts image_url

                      image_url=URI.parse(image_url)
                      @merchant=Merchant.create!(name: merchant_name , image_url:image_url)
                     first_page = client.get("https://www.google.com#{href}")
                     document_page = Nokogiri::HTML::Document.parse(first_page.body)
                     page_no = document_page.css('.pagination.pagination-centered ul li:nth-last-child(2) a').text.to_i

                     1.upto(page_no) do |page_number_giftcard|
                       type1=[]
                         card_page = client.get("https://www.google.com#{href}?page=#{page_number_giftcard}")
                         document_page = Nokogiri::HTML::Document.parse(card_page.body)
                         document_page.xpath('//table/tbody/tr[@class="toggle-details"]').collect do |row|



                           row.at("td[2] ul").children.each do |typeli|
                           type = typeli.text.strip if typeli.text.strip.length != 0
                           type1 << type if typeli.text.strip.length != 0
                           end

                             value = row.at('td[3]').text.strip
                             value = value.to_s.tr('$', '').to_f
                             puts value

                             per_discount = row.at('td[4]').text.strip
                             per_discount = per_discount.to_s.tr('%', '').to_f
                             puts per_discount

                             final_price = row.at('td[5] strong').text.strip
                             final_price = final_price.to_s.tr('$', '').to_f
                             puts final_price
                             puts '******************************'
                               @giftcard=Giftcard.create(card_type:1, card_value:value, per_off:per_discount, card_price: final_price, merchant_id: @merchant.id)
                         end
                           @giftcard.update_attribute()
                     end
                 end
             end
         end

提前谢谢你。

1 个答案:

答案 0 :(得分:0)

这样做基本上就是保存所有数据。

@merchant=Merchant.create!(name: merchant_name , image_url:image_url)

您可以尝试使用find_or_create_by

@merchant=Merchant.find_or_create_by(name: merchant_name , image_url:image_url)

http://apidock.com/rails/v4.0.2/ActiveRecord/Relation/first_or_create http://apidock.com/rails/v4.0.2/ActiveRecord/Relation/find_or_create_by