在ruby中运行rake导入任务时内存不足

时间:2018-02-06 12:14:21

标签: mysql ruby-on-rails ruby

我正在运行一项任务,导入大约100万个订单。我循环遍历数据以将其更新为新数据库上的值,并且它在我的本地计算机上运行8 gig ram。

然而,当我将它上传到我的AWS实例t2.medium时,它将运行前50万行但是到最后,当它开始实际创建不存在的订单时,我将开始最大化我的记忆。我正在将mysql数据库移植到postgres

我在这里错过了一些明显的东西吗?

require 'mysql2' # or require 'pg'

require 'active_record'

def legacy_database
  @client ||= Mysql2::Client.new(Rails.configuration.database_configuration['legacy_production'])
end

desc "import legacy orders"
task orders: :environment do
  orders = legacy_database.query("SELECT * FROM oc_order")

  # init progressbar
  progressbar = ProgressBar.create(:total => orders.count, :format => "%E, \e[0;34m%t: |%B|\e[0m")

  orders.each do |order|
    if [1, 2, 13, 14].include? order['order_status_id']
      payment_method = "wx"
      if order['paid_by'] == "Alipay"
        payment_method = "ap"
      elsif order['paid_by'] == "UnionPay"
        payment_method = "up"
      end

      user_id = User.where(import_id: order['customer_id']).first
      if user_id
        user_id = user_id.id
      end

        order = Order.create(
          # id: order['order_id'],
          import_id: order['order_id'],
          # user_id: order['customer_id'],
          user_id: user_id,
          receiver_name: order['payment_firstname'],
          receiver_address: order['payment_address_1'],
          created_at: order['date_added'],
          updated_at: order['date_modified'],
          paid_by: payment_method,
          order_num: order['order_id']
        )

      #increment progress bar on each save
      progressbar.increment
    end
  end
end

4 个答案:

答案 0 :(得分:3)

我假设这一行orders = legacy_database.query("SELECT * FROM oc_order")将整个表加载到内存中,这是非常无效的。

您需要批量迭代表。在ActiveRecord中,有find_each方法。您可能希望使用limitoffset实现自己的批量查询,因为您不使用ActiveRecord。

答案 1 :(得分:2)

为了有效处理内存,您可以按 nattfodd 的建议批量运行mysql查询。

根据mysql documentation,有两种方法可以实现它:

SELECT * FROM oc_order LIMIT 5,10; 要么 SELECT * FROM oc_order LIMIT 10 OFFSET 5;

两个查询都将返回第6-15行。

您可以决定所选择的偏移量并循环运行查询,直到您的订单对象为空。

让我们假设您一次处理1000个订单,然后您将拥有以下内容:

batch_size = 1000
offset = 0
loop do
  orders = legacy_database.query("SELECT * FROM oc_order LIMIT #{batch_size} OFFSET #{offset}")

  break unless orders.present?

  offset += batch_size

  orders.each do |order|

    ... # your logic of creating new model objects
  end
end

还建议您在生产中运行代码并进行适当的错误处理:

begin
  ... # main logic
rescue => e
  ... # handle error
ensure
  ... # ensure 
end

答案 2 :(得分:1)

在迭代订单集合时禁用row caching应减少内存消耗:

[
{
  _id: "",
  _to: "",
  _parent: "",
  lastThreeUsers: [_ from, ...],
  count: 50
}
]

答案 3 :(得分:0)

有一个宝石可以帮助我们做到这一点activerecord-import

bulk_orders=[]

orders.each do |order|      
   order = Order.new(
          # id: order['order_id'],
          import_id: order['order_id'],
          # user_id: order['customer_id'],
          user_id: user_id,
          receiver_name: order['payment_firstname'],
          receiver_address: order['payment_address_1'],
          created_at: order['date_added'],
          updated_at: order['date_modified'],
          paid_by: payment_method,
          order_num: order['order_id']
        )
end

Order.import bulk_orders, validate: false

使用单个 INSERT 语句。