Question

我需要在MongoDB中创建一个包含超过100,000个联系人的大型列表（将List_Id插入到每个联系人记录中）。所以我的解决方案的想法是：首先添加100个联系人，然后返回到客户端，以便UI可以显示前100个联系人。然后，其余的联系人将在稍后添加。

我的问题是我想让进程/方法在返回客户端时继续在后台运行。我的直觉说线程对它来说是一个很好的解决方案。

示例代码：

def add_contacts_to_list
    count = 0
    thread = Thread.new{
      @contacts.each do |contact|
         add_to_list(contact, list_id)
         count = count + 1
         #what I want
            when count = 100, return to client, and the thread is still running like nothing happens
         #what I want
      end
    }
    thread.join
end

Answer 1

听起来你在使用单独的线程来处理更大的进程方面处于正确的轨道上。您可能希望获得最初的100个结果，然后将作业提交到sidekiq或resque之类的东西以运行更大的操作。线程是一个复杂的主题，不同的服务器表现不同。令人尊敬的Aaron Patterson（招标）在这个帖子中讨论了它：http://tenderlovemaking.com/2012/06/18/removing-config-threadsafe.html - 这实际上是他的域名，几乎完全安全的工作，它的价值。如果你打算运行如此大的进程，你肯定需要将它们从阻塞请求线程中移出来，所以我再次建议查看某种类似上面提到的作业队列。

Answer 2

对于非常小的负载应用，您的方法似乎没问题......

...但是，如果您的应用程序将运行繁重的负载，您应该解决通常使用作业排队解决的问题，例如：

多线程可能会破坏数据（即使使用MRI - 全局锁的保护也是有限的）。
创建太多线程可能会导致显着减速，从而导致应用无响应。

话虽如此，我将尝试为这两种方法编写示例，使用您的方法为每个请求使用一个单独的线程并使用非常简单的自制que。

使用你的代码并稍微修改它以适应一个线程在它创建后立即返回的事实，在后台运行，你的代码看起来像这样：

def add_contacts_to_list
    # create a proc, so the code doesn't repeat itself (DRY)
    the_job = Proc.new do |contact_list|
        contact_list.each {|c| add_to_list(c, @list_id)}
    end
    # get 100 contacts first
    the_job.call @contacts[0..99]
    # Send the rest to a thread
    thread = Thread.new { the_job.call @contacts[100..-1] }
    # that's it. we now return and the thread works in the background.
end

另一方面，这是一个简单的Que模块（仅用于演示），可以更好地工作：

module SimpleQue
    QUE = []
    QUE_LOCKER = Mutex.new
    @kill_thread = false

    def self.que_job *args, &job
        raise "Cannot que jobs after que was set to finish!" if @kill_thread
        raise "Missing a job to que!" unless job
        QUE_LOCKER.synchronize { QUE << [job, args] }
        true 
    end

    THREAD = Thread.new do
        begin
            until @kill_thread && QUE.empty?
                sleep 0.5 while QUE_LOCKER.synchronize { QUE.empty? }
                job, args = QUE_LOCKER.synchronize { QUE.shift }
                job.call(*args)
            end
        rescue => e
            # change this to handle errors
            puts e
            retry
        end
    end

    def self.join
        @kill_thread = true
        THREAD.join
    end

end

# test it:
SimpleQue.que_job("hi!") {|s| sleep 1; puts s}
SimpleQue.que_job("nice!") {|s| sleep 1; puts s}
SimpleQue.que_job("hi!") {|s| sleep 1; puts s}
SimpleQue.que_job("hi!") {|s| sleep 1; puts s}
SimpleQue.que_job("yo!") {|s| sleep 1; puts s}
SimpleQue.que_job("bye!") {|s| sleep 1; puts s}
puts "sent everything to the que, now about to wait using #join."
SimpleQue.join
SimpleQue.que_job("hi?") {|s| sleep 1; puts s}

# adjusting your code, ignoring multithreading issues:

def add_contacts_to_list
    # create a proc, so the code doesn't repeat itself (DRY)
    the_job = Proc.new do |contact_list|
        contact_list.each {|c| add_to_list(c, @list_id)}
    end
    # get 100 contacts first
    the_job.call @contacts[0..99]
    # Send the rest to the que
    SimpleQue.que_job(@contacts[100..-1], &the_job)
    # that's it. we now return and the que works in the background.
end


# adjusting your code, adding basic multithreading safety:

def add_contacts_to_list
    # sending a job to the que:
    SimpleQue.que_job(@contacts) do |contact_list|
        contact_list.each {|c| add_to_list(c, @list_id)}
    end
    # I removed: the_job.call @contacts[0..99]
    # it's better if you didn't even start the first 100 contacts...
    # ...it might cause data corruption when different threads do it.
end

Answer 3

我会将它安排到后台（sidekiq或其他）并将客户端重定向到可以显示联系人的页面。对于这样的功能，我认为采用线程管理是不必要的风险。

@contacts.take(100).each { |contact| add_to_list(contact) }
perform_async
return_to_client

从Ruby方法返回，但保持在后台运行

3 个答案: