你可以使用MongoDB map / reduce来迁移数据吗?

时间:2012-05-30 22:53:28

标签: ruby-on-rails mapreduce mongoid

我有一个大型集合,我想通过填充字段来修改所有文档。

一个简单的例子可能是缓存每个帖子的评论计数:

class Post
  field :comment_count, type: Integer
  has_many :comments
end
class Comment
  belongs_to :post
end

我可以通过以下方式连续运行它:

Post.all.each do |p|
  p.udpate_attribute :comment_count, p.comments.count
end

但它需要24小时才能运行(大型收藏)。我想知道mongo的map / reduce是否可以用于此?但我还没有看到一个很好的例子。

我想你会映射评论集合,然后将缩小的结果存储在posts集合中。我是在正确的轨道上吗?

1 个答案:

答案 0 :(得分:0)

您可以使用MongoDB map / reduce来“帮助”迁移数据, 遗憾的是,您不能使用它来完成服务器端的完全迁移。 你走在正确的轨道上,基本的想法是:

  1. 将每个评论映射到emit(post_id,{comment_count:1})---> {_id:post_id,value:{comment_count:1}}
  2. 减少到值{comment_count:N} N是count-sum ---> {_id:post_id,value:{comment_count:N}}
  3. 指定输出选项{reduce:'posts'}以将map / reduce comment_counts的结果减少回帖子集合
  4. 经过一番粗略的调查,我发现你可以靠近, 但是存在阻止您完全从服务器端迁移的问题。 reduce的结果具有{_id:KEY,value:MAP_REDUCE_VALUE}的形状。 我们现在仍然坚持这种形状,似乎没有办法绕过它。 所以你既不能得到这个形状之外的完整原始文档作为减少的输入(实际上,你将丢失这个形状之外的数据), 由于减少,也不会在此形状之外更新文档。 因此,您的帖子集合的“最终”更新必须通过客户端以编程方式完成。 看起来修复这将是一个很好的修改请求。

    下面找到一个工作示例,演示如何在Ruby中使用MongoDB map / reduce来计算所有comment_counts。 然后,我以编程方式使用map_reduce_results集合来更新posts集合中的comment_count。 reduce函数从尝试使用中删除:{reduce:'posts'}

    您可以通过一些实验验证我的答案, 或者,如果您愿意,我可以根据请求发布非工作完全服务器端尝试, 完成固定模型。 希望这有助于理解Ruby中的MongoDB map / reduce。

    测试/单元/ comment_test.rb

    require 'test_helper'
    
    class CommentTest < ActiveSupport::TestCase
      def setup
        @map_reduce_results_name = 'map_reduce_results'
        delete_all
      end
    
      def delete_all
        Post.delete_all
        Comment.delete_all
        Mongoid.database.drop_collection(@map_reduce_results_name)
      end
    
      def dump(title = nil)
        yield
        puts title
        Post.all.to_a.each do |post|
          puts "#{post.to_json} #{post.comments.collect(&:text).to_json}"
        end
      end
    
      def generate
        (2+rand(2)).times do |p|
          post = Post.create(text: 'post_' + p.to_s)
          comments = (2+rand(3)).times.collect do |c|
            Comment.create(text: "post_#{p} comment_#{c}")
          end
          post.comments = comments
        end
      end
    
      def generate_and_migrate(title = nil)
        dump(title + ' generate:') { generate }
        dump(title + ' migrate:') { yield }
      end
    
      test "map reduce migration" do
        generate_and_migrate('programmatic') do
          Post.all.each do |p|
            p.update_attribute :comment_count, p.comments.count
          end
        end
        delete_all
        generate_and_migrate('map/reduce') do
          map = "function() { emit( this.post_id, {comment_count: 1} ); }"
          reduce = <<-EOF
            function(key, values) {
              var result = {comment_count: 0};
              values.forEach(function(value) { result.comment_count += value.comment_count; });
              return result;
            }
          EOF
          out = @map_reduce_results_name #{reduce: 'posts'}
          result_coll = Comment.collection.map_reduce(map, reduce, out: out)
          puts "#{@map_reduce_results_name}:"
          result_coll.find.each do |doc|
            p doc
            Post.find(doc['_id']).update_attribute :comment_count, doc['value']['comment_count'].to_i
          end
        end
      end
    end
    

    测试输出(抱歉JSON和Ruby检查的混合)

    Run options: --name=test_map_reduce_migration
    
    # Running tests:
    
    programmatic generate:
    {"_id":"4fcae3bde4d30b21e2000001","comment_count":null,"text":"post_0"} ["post_0 comment_0","post_0 comment_1","post_0 comment_2"]
    {"_id":"4fcae3bde4d30b21e2000005","comment_count":null,"text":"post_1"} ["post_1 comment_1","post_1 comment_0","post_1 comment_2","post_1 comment_3"]
    {"_id":"4fcae3bde4d30b21e200000a","comment_count":null,"text":"post_2"} ["post_2 comment_1","post_2 comment_3","post_2 comment_0","post_2 comment_2"]
    programmatic migrate:
    {"_id":"4fcae3bde4d30b21e2000001","comment_count":3,"text":"post_0"} ["post_0 comment_0","post_0 comment_1","post_0 comment_2"]
    {"_id":"4fcae3bde4d30b21e2000005","comment_count":4,"text":"post_1"} ["post_1 comment_1","post_1 comment_0","post_1 comment_2","post_1 comment_3"]
    {"_id":"4fcae3bde4d30b21e200000a","comment_count":4,"text":"post_2"} ["post_2 comment_1","post_2 comment_3","post_2 comment_0","post_2 comment_2"]
    map/reduce generate:
    {"_id":"4fcae3bee4d30b21e200000f","comment_count":null,"text":"post_0"} ["post_0 comment_0","post_0 comment_1"]
    {"_id":"4fcae3bee4d30b21e2000012","comment_count":null,"text":"post_1"} ["post_1 comment_2","post_1 comment_0","post_1 comment_1"]
    {"_id":"4fcae3bee4d30b21e2000016","comment_count":null,"text":"post_2"} ["post_2 comment_0","post_2 comment_1","post_2 comment_2","post_2 comment_3"]
    map_reduce_results:
    {"_id"=>BSON::ObjectId('4fcae3bee4d30b21e200000f'), "value"=>{"comment_count"=>2.0}}
    {"_id"=>BSON::ObjectId('4fcae3bee4d30b21e2000012'), "value"=>{"comment_count"=>3.0}}
    {"_id"=>BSON::ObjectId('4fcae3bee4d30b21e2000016'), "value"=>{"comment_count"=>4.0}}
    map/reduce migrate:
    {"_id":"4fcae3bee4d30b21e200000f","comment_count":2,"text":"post_0"} ["post_0 comment_0","post_0 comment_1"]
    {"_id":"4fcae3bee4d30b21e2000012","comment_count":3,"text":"post_1"} ["post_1 comment_2","post_1 comment_0","post_1 comment_1"]
    {"_id":"4fcae3bee4d30b21e2000016","comment_count":4,"text":"post_2"} ["post_2 comment_0","post_2 comment_1","post_2 comment_2","post_2 comment_3"]
    .
    
    Finished tests in 0.072870s, 13.7231 tests/s, 0.0000 assertions/s.
    
    1 tests, 0 assertions, 0 failures, 0 errors, 0 skips