Rake任务不保存或在数据库中创建新记录

时间:2016-08-22 19:33:12

标签: ruby-on-rails ruby activerecord rake rakefile

如果我从Console运行它,我已经创建了一个执行正常的ruby脚本。

该脚本从各种网站获取一些信息并将其保存到我的数据库表中。

但是,当我想将代码转换为rake任务时,代码仍会运行,但它不会保存任何新记录。我也没有从佣金中得到任何错误。

# Add your own tasks in files placed in lib/tasks ending in .rake,
# for example lib/tasks/capistrano.rake, and they will automatically be           available to Rake.

require File.expand_path('../config/application', __FILE__)

Rails.application.load_tasks

require './crawler2.rb'
task :default => [:crawler]

task :crawler do

### ###

require 'rubygems'
require 'nokogiri'
require 'open-uri'

start = Time.now

$a = 0

sites = ["http://www.nytimes.com","http://www.news.com"]

for $a in 0..sites.size-1

url = sites[$a] 

$i = 75

$error = 0

avoid_these_links = ["/tv", "//www.facebook.com/"]

doc = Nokogiri::HTML(open(url))

    links = doc.css("a")
    hrefs = links.map {|link| link.attribute('href').to_s}.uniq.sort.delete_if {|href| href.empty?}.delete_if {|href| avoid_these_links.any? { |w| href =~ /#{w}/ }}.delete_if {|href| href.size < 10 }

#puts hrefs.length

#puts hrefs

for $i in 0..hrefs.length
    begin

        #puts hrefs[60] #for debugging)

    #file = open(url)
    #doc = Nokogiri::HTML(file) do

        if hrefs[$i].downcase().include? "http://"

            doc = Nokogiri::HTML(open(hrefs[$i]))

        else 

            doc = Nokogiri::HTML(open(url+hrefs[$i]))

        end 

        image = doc.at('meta[property="og:image"]')['content']
        title = doc.at('meta[property="og:title"]')['content']
        article_url = doc.at('meta[property="og:url"]')['content']
        description = doc.at('meta[property="og:description"]')['content']
        category = doc.at('meta[name="keywords"]')['content']

        newspaper_id = 1 


        puts "\n"
        puts $i
        #puts "Image: " + image
        #puts "Title: " + title
        #puts "Url: " + article_url
        #puts "Description: " + description
        puts "Catory: " + category

            Article.create({ 
            :headline => title, 
            :caption => description, 
            :thumbnail_url => image, 
            :category_id => 3, 
            :status => true, 
            :journalist_id => 2, 
            :newspaper_id => newspaper_id, 
            :from_crawler => true,
            :description => description,
            :original_url => article_url}) unless Article.exists?(original_url: article_url)

        $i +=1

        #puts $i #for debugging

        rescue
        #puts "Error here: " + url+hrefs[$i] if $i < hrefs.length
        $i +=1    # do_something_* again, with the next i
        $error +=1

    end 

end

puts "Page: " + url
puts "Articles: " + hrefs.length.to_s
puts "Errors: " + $error.to_s

$a +=1

end

finish = Time.now

diff = ((finish - start)/60).to_s

puts diff + " Minutes"


### ###


end

代码执行正常,如果我将文件保存为crawler.rb并通过执行 - &gt;在Console中打开它“load'./ crawler2.rb'”。当我在rake任务中使用完全相同的代码时,我没有获得新的记录。

1 个答案:

答案 0 :(得分:0)

我弄清楚出了什么问题。

我需要删除:

require './crawler2.rb'
task :default => [:crawler]

而是编辑以下内容:

task :crawler => :environment do

现在,爬虫在Heroku调度程序的帮助下每十分钟运行一次: - )

感谢帮助人员 - 抱歉格式错误。希望这个答案可以帮助别人。