用大红宝石/ json文件填充数据库的最佳方法?

时间:2018-08-13 14:36:17

标签: ruby-on-rails ruby activerecord rake

假设我有一个由哈希组成的ruby或json文件,范围在14-20MB之间(未压缩300K行)。我创建了一个rake任务,该任务循环遍历每个哈希,并根据每个哈希中的值创建一个AR对象。

不幸的是,由于文件的大小,每次我运行任务时都会遇到stack level too deep错误。我真正让脚本运行的唯一方法是将文件拆分为较小的文件。尽管这可行,但拆分文件并一遍又一遍地重复任务非常繁琐。加载/运行大文件有什么好的选择吗?

耙任务

namespace :db do
  task populate: :environment do
    $restaurants.each_with_index do |r, index|
      uri = URI(r[:website])

      restaurant = Restaurant.find_or_create_by(name: r[:name], website: "#{uri.scheme}://#{uri.host}")

      restaurant.cuisines = r[:cuisines].map { |c| Cuisine.find_or_create_by(name: c) }

      location = Location.create(
        restaurant: restaurant,
        city_id: 1,
        address: r[:address],
        latitude: r[:latitude],
        longitude: r[:longitude],
        phone_number: r[:phone_number]
      )

      r[:hours].each do |h|
        Hour.create(
          location: location,
          day: Date::DAYNAMES.index(h[:day]),
          opens: h[:opens],
          closes: h[:closes]
        )
      end

      menu_group = MenuGroup.create(
        restaurant: restaurant,
        locations: [location],
        address: r[:address]
      )

      r[:menus].each do |m|
        menu = Menu.create(
          menu_group: menu_group,
          position: m[:position],
          name: m[:name]
        )

        m[:sections].each do |s|
          section = Section.create(
            menu: menu,
            position: s[:position],
            name: s[:name]
          )

          s[:dishes].each do |d|
            tag = Tag.find_or_create_by(
              name: d[:name].downcase.strip
            )

            Dish.find_or_create_by(
              restaurant: restaurant,
              sections: [section],
              tags: [tag],
              name: d[:name],
              description: d[:description]
            )
          end
        end
      end

      puts "#{index + 1} of #{$restaurants.size} completed"
    end
  end
end

错误

rake aborted!
SystemStackError: stack level too deep
/usr/local/lib/ruby/gems/2.5.0/gems/bootsnap-1.3.1/lib/bootsnap/compile_cache/iseq.rb:12:in`to_binary'
/usr/local/lib/ruby/gems/2.5.0/gems/bootsnap-1.3.1/lib/bootsnap/compile_cache/iseq.rb:12:in`input_to_storage'
/usr/local/lib/ruby/gems/2.5.0/gems/bootsnap-1.3.1/lib/bootsnap/compile_cache/iseq.rb:37:in`fetch'
/usr/local/lib/ruby/gems/2.5.0/gems/bootsnap-1.3.1/lib/bootsnap/compile_cache/iseq.rb:37:in`load_iseq'
/usr/local/lib/ruby/gems/2.5.0/gems/bootsnap-1.3.1/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:21:in `require'
/usr/local/lib/ruby/gems/2.5.0/gems/bootsnap-1.3.1/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:21:in `block in require_with_bootsnap_lfi'
/usr/local/lib/ruby/gems/2.5.0/gems/bootsnap-1.3.1/lib/bootsnap/load_path_cache/loaded_features_index.rb:65:in `register'
/usr/local/lib/ruby/gems/2.5.0/gems/bootsnap-1.3.1/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:20:in `require_with_bootsnap_lfi'
/usr/local/lib/ruby/gems/2.5.0/gems/bootsnap-1.3.1/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:29:in `require'
/usr/local/lib/ruby/gems/2.5.0/gems/activesupport-5.2.0/lib/active_support/dependencies.rb:283:in `block in require'
/usr/local/lib/ruby/gems/2.5.0/gems/activesupport-5.2.0/lib/active_support/dependencies.rb:249:in `load_dependency'
/usr/local/lib/ruby/gems/2.5.0/gems/activesupport-5.2.0/lib/active_support/dependencies.rb:283:in `require'
/Users/user/app/lib/tasks/populate.rake:1:in `<main>'
/usr/local/lib/ruby/gems/2.5.0/gems/bootsnap-1.3.1/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:50:in `load'
/usr/local/lib/ruby/gems/2.5.0/gems/bootsnap-1.3.1/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:50:in `load'
/usr/local/lib/ruby/gems/2.5.0/gems/activesupport-5.2.0/lib/active_support/dependencies.rb:277:in `block in load'
/usr/local/lib/ruby/gems/2.5.0/gems/activesupport-5.2.0/lib/active_support/dependencies.rb:249:in `load_dependency'
/usr/local/lib/ruby/gems/2.5.0/gems/activesupport-5.2.0/lib/active_support/dependencies.rb:277:in `load'
/usr/local/lib/ruby/gems/2.5.0/gems/railties-5.2.0/lib/rails/engine.rb:650:in `block in run_tasks_blocks'
/usr/local/lib/ruby/gems/2.5.0/gems/railties-5.2.0/lib/rails/engine.rb:650:in `each'
/usr/local/lib/ruby/gems/2.5.0/gems/railties-5.2.0/lib/rails/engine.rb:650:in `run_tasks_blocks'
/usr/local/lib/ruby/gems/2.5.0/gems/railties-5.2.0/lib/rails/application.rb:515:in `run_tasks_blocks'
/usr/local/lib/ruby/gems/2.5.0/gems/railties-5.2.0/lib/rails/engine.rb:459:in `load_tasks'
/Users/user/app/Rakefile:6:in `<top (required)>'
/usr/local/lib/ruby/gems/2.5.0/gems/rake-12.3.1/exe/rake:27:in `<top (required)>'
(See full trace by running task with --trace)

1 个答案:

答案 0 :(得分:0)

我会使用类似Sidekiq的方式将工作分解为可以同时运行的工作程序。

例如:

$restaurants.each_with_index do |r, index|
    RestaurantParser.perform_async(r, index)
end

在RestaurantParser中执行您通常要执行的步骤。

只要饭店不依赖数据库中已有的其他饭店,您就可以同时运行工作程序以加快流程。