Question

我在将用户提供的excel文件中的大量记录导入数据库时遇到问题。这个的逻辑工作正常，我正在使用ActiveRecord-import来减少数据库调用的数量。但是，当文件太大时，处理可能会花费太长时间，Heroku将返回超时。解决方案：重新处理并将处理移至后台作业。

到目前为止，这么好。我需要添加CarrierWave来将文件上传到S3，因为我不能只将文件保存在内存中以用于后台作业。上传部分也正常工作，我为他们创建了一个模型，并将ID传递给排队的作业，以便稍后检索文件，因为据我所知，我无法将整个ActiveRecord对象传递给作业。

我已经在本地安装了Resque和Redis，在这方面似乎所有东西都设置正确。我可以看到我正在创建的作业排队，然后运行而不会失败。该作业似乎运行正常，但没有记录添加到数据库。如果我在控制台中逐行运行我的作业代码，记录将按照我的预期添加到数据库中。但是当我正在创建的排队作业运行时，没有任何反应。

我无法解决问题所在。

这是我的上传控制器的创建操作：

def create
  @upload = Upload.new(upload_params)
  if @upload.save
    Resque.enqueue(ExcelImportJob, @upload.id)
    flash[:info] = 'File uploaded.
        Data will be processed and added to the database.'
    redirect_to root_path
  else
    flash[:warning] = 'Upload failed. Please try again.'
    render :new
  end
end

这是作业的简化版本，为清晰起见，列表列数较少：

class ExcelImportJob < ApplicationJob
  @queue = :default

  def perform(upload_id)
    file = Upload.find(upload_id).file.file.file
    data = parse_excel(file)
    if header_matches? data
      # Create a database entry for each row, ignoring the first header row
      # using activerecord-import
      sales = []
      data.drop(1).each_with_index do |row, index|
        sales << Sale.new(row)
        if index % 2500 == 0
          Sale.import sales
          sales = []
        end
      end
      Sale.import sales
    end

    def parse_excel(upload)
      # Open the uploaded excel document
      doc = Creek::Book.new upload

      # Map rows to the hash keys from the database
      doc.sheets.first.rows.map do |row|
        { date: row.values[0],
          title: row.values[1],
          author: row.values[2],
          isbn: row.values[3],
          release_date: row.values[5],
          units_sold: row.values[6],
          units_refunded: row.values[7],
          net_units_sold: row.values[8],
          payment_amount: row.values[9],
          payment_amount_currency: row.values[10] }
      end
    end

    # Returns true if header matches the expected format
    def header_matches?(data)
      data.first == {:date => 'Date',
                     :title => 'Title',
                     :author => 'Author',
                     :isbn => 'ISBN',
                     :release_date => 'Release Date',
                     :units_sold => 'Units Sold',
                     :units_refunded => 'Units Refunded',
                     :net_units_sold => 'Net Units Sold',
                     :payment_amount => 'Payment Amount',
                     :payment_amount_currency => 'Payment Amount Currency'}
    end
  end
end

我现在可能有一些改进的逻辑，因为我现在把整个文件保存在内存中，但这不是我遇到的问题 - 即使是一个只有500行左右的小文件， job不会向数据库添加任何内容。

就像我说我的代码在我不使用后台作业时工作正常，如果我在控制台中运行它仍然可以工作。但由于某种原因，这项工作无所作为。

这是我第一次使用Resque，所以我不知道我是否遗漏了一些明显的东西？我确实创造了一个工人，正如我所说，它似乎确实在运作。这是Resque的详细格式化程序的输出：

*** resque-1.27.4: Waiting for default
*** Checking default
*** Found job on default
*** resque-1.27.4: Processing default since 1508342426 [ExcelImportJob]
*** got: (Job{default} | ExcelImportJob | [15])
*** Running before_fork hooks with [(Job{default} | ExcelImportJob | [15])]
*** resque-1.27.4: Forked 63706 at 1508342426
*** Running after_fork hooks with [(Job{default} | ExcelImportJob | [15])]
*** done: (Job{default} | ExcelImportJob | [15])

在Resque仪表板中，作业不会记录为失败。它们被执行了，我可以看到统计页面上“已处理”作业的增量。但正如我所说，数据库仍未受到影响。这是怎么回事？如何更清楚地调试工作？有没有办法让Pry进入它？

Answer 1

看起来我的问题出在Resque.enqueue(ExcelImportJob, @upload.id)上。

我将代码更改为ExcelImportJob.perform_later(@upload.id)，现在我的代码实际运行了！

我还向resque.rake添加了lib/tasks任务，如下所述：http://bica.co/2015/01/20/active-job-resque/。

该链接还说明了如何使用rails runner来调用作业而无需运行完整的Rails服务器并触发作业，这对调试非常有用。

奇怪的是，我并没有像@hoffm所建议的那样把工作打印到STDOUT，但至少它让我走上了一条很好的探索之路。

我仍然不完全理解为什么调用Resqueue.enqueue仍然将我的作业添加到队列中并且确实似乎运行它们之间的区别，但是代码没有被执行，所以如果有人有更好的把握和解释，非常感谢。

TL; DR：调用perform_later而不是Resque.enqueue解决问题，但我不知道原因。

Rails + resque后台作业导入不向数据库添加任何内容

1 个答案: