Question

我正在编写一个rake任务，每隔一分钟（可能每30秒一次）被Whenever调用，它会联系一个轮询API端点（我们数据库中的每个用户）。显然，这不是单个线程的高效运行，但可以多线程吗？如果没有，是否有一个好的基于事件的HTTP库可以完成工作？

Answer 1

我正在编写一个rake任务，每分钟都会调用一次（可能每30秒一次）。

小心Rails启动时间，最好使用resking或Sidekiq等分叉模型，Rescue提供https://github.com/bvandenbos/resque-scheduler应该能够做你需要的东西，我不能谈论Sidekiq，但是我确信它有类似的东西（Sidekiq比Resque更新）

显然，这不是单个线程的高效运行，但可以多线程吗？如果没有，是否有一个好的基于事件的HTTP库可以完成工作？

我建议您查看ActiveRecord's find_each有关提高查找程序处理效率的提示，一旦批量生成，您可以使用以下线程轻松执行某些操作：

#
# Find each returns 50 by default, you can pass options
# to optimize that for larger (or smaller) batch sizes
# depending on your available RAM
#
Users.find_each do |batch_of_users|
  #
  # Find each returns an Enumerable collection of users
  # in that batch, they'll be always smaller than or 
  # equal to the batch size chosen in `find_each`
  #
  #
  # We collect a bunch of new threads, one for each
  # user, eac 
  #
  batch_threads = batch_of_users.collect do |user|
    #
    # We pass the user to the thread, this is good
    # habit for shared variables, in this case
    # it doesn't make much difference
    #
    Thread.new(user) do |u|
      #
      # Do the API call here use `u` (not `user`)
      # to access the user instance
      #
      # We shouldn't need to use an evented HTTP library
      # Ruby threads will pass control when the IO happens
      # control will return to the thread sometime when
      # the scheduler decides, but 99% of the time
      # HTTP and network IO are the best thread optimized
      # thing you can do in Ruby.
      #
    end
  end
  #
  # Joining threads means waiting for them to finish
  # before moving onto the next batch.
  #
  batch_threads.map(&:join)
end

这将开始不超过batch_size个线程，等待每个batch_size完成后。

可能会做这样的事情，但是你会有一个无法控制的线程数量，你可以从这里获得一个替代方案，它会变得更加复杂，包括一个ThreadPool，以及共享的工作列表，我在Github发布了它，所以不要垃圾堆栈溢出：https://gist.github.com/6767fbad1f0a66fa90ac

Answer 2

我建议使用sidekiq，它在多线程方面很出色。然后，您可以为每个用户排队单独的作业以轮询API。 clockwork可用于制作您重新入选的作业。

多线程rake任务

2 个答案: