Clojure代理从队列中消耗

时间:2010-04-08 19:28:04

标签: concurrency clojure queue

我正在尝试找出使用代理来使用消息队列(Amazon SQS)中的项目的最佳方法。现在我有一个函数(process-queue-item)从队列中抓取一个项目并对其进行处理。

我想同时处理这些项目,但我无法理解如何控制代理。基本上我想尽可能地让所有代理忙,而不是从队列中抽取很多项目并开发积压(我会在几台机器上运行它,所以项目需要留在队列中直到它们真的需要)。

有人能给我一些关于改进实施的指示吗?

(def active-agents (ref 0))

(defn process-queue-item [_]
  (dosync (alter active-agents inc))
  ;retrieve item from Message Queue (Amazon SQS) and process
  (dosync (alter active-agents dec)))

(defn -main []
  (def agents (for [x (range 20)] (agent x)))

  (loop [loop-count 0]

    (if (< @active-agents 20)
      (doseq [agent agents]
        (if (agent-errors agent)
          (clear-agent-errors agent))
        ;should skip this agent until later if it is still busy processing (not sure how)
        (send-off agent process-queue-item)))

    ;(apply await-for (* 10 1000) agents)
    (Thread/sleep  10000)
    (logging/info (str "ACTIVE AGENTS " @active-agents))
    (if (> 10 loop-count)
      (do (logging/info (str "done, let's cleanup " count))
       (doseq [agent agents]
         (if (agent-errors agent)
           (clear-agent-errors agent)))
       (apply await agents)
       (shutdown-agents))
      (recur (inc count)))))

4 个答案:

答案 0 :(得分:23)

(let [switch (atom true) ; a switch to stop workers
      workers (doall 
                (repeatedly 20 ; 20 workers pulling and processing items from SQS
                  #(future (while @switch 
                             (retrieve item from Amazon SQS and process)))))]
  (Thread/sleep 100000) ; arbitrary rule to decide when to stop ;-)
  (reset! switch false) ; stop !
  (doseq [worker workers] @worker)) ; waiting for all workers to be done

答案 1 :(得分:6)

你要求的是一种继续分发任务的方法,但有一些上限。一种简单的方法是使用信号量来协调限制。以下是我将如何处理它:

(let [limit (.availableProcessors (Runtime/getRuntime))
      ; note: you might choose limit 20 based upon your problem description
      sem (java.util.concurrent.Semaphore. limit)]
  (defn submit-future-call
    "Takes a function of no args and yields a future object that will
    invoke the function in another thread, and will cache the result and
    return it on all subsequent calls to deref/@. If the computation has
    not yet finished, calls to deref/@ will block. 
    If n futures have already been submitted, then submit-future blocks
    until the completion of another future, where n is the number of
    available processors."  
    [#^Callable task]
    ; take a slot (or block until a slot is free)
    (.acquire sem)
    (try
      ; create a future that will free a slot on completion
      (future (try (task) (finally (.release sem))))
      (catch java.util.concurrent.RejectedExecutionException e
        ; no task was actually submitted
        (.release sem)
        (throw e)))))

(defmacro submit-future
  "Takes a body of expressions and yields a future object that will
  invoke the body in another thread, and will cache the result and
  return it on all subsequent calls to deref/@. If the computation has
  not yet finished, calls to deref/@ will block.
  If n futures have already been submitted, then submit-future blocks
  until the completion of another future, where n is the number of
  available processors."  
  [& body] `(submit-future-call (fn [] ~@body)))

#_(example
    user=> (submit-future (reduce + (range 100000000)))
    #<core$future_call$reify__5782@6c69d02b: :pending>
    user=> (submit-future (reduce + (range 100000000)))
    #<core$future_call$reify__5782@38827968: :pending>
    user=> (submit-future (reduce + (range 100000000)))
    ;; blocks at this point for a 2 processor PC until the previous
    ;; two futures complete
    #<core$future_call$reify__5782@214c4ac9: :pending>
    ;; then submits the job

现在,只需要协调任务本身的处理方式。听起来你已经有了这样做的机制。循环(submit-future(process-queue-item))

答案 2 :(得分:4)

也许您可以使用seque功能?引用(doc seque)

clojure.core/seque
([s] [n-or-q s])
  Creates a queued seq on another (presumably lazy) seq s. The queued
  seq will produce a concrete seq in the background, and can get up to
  n items ahead of the consumer. n-or-q can be an integer n buffer
  size, or an instance of java.util.concurrent BlockingQueue. Note
  that reading from a seque can block if the reader gets ahead of the
  producer.

我想到的是通过网络获取队列项的懒惰序列;你将它包装在seque中,将它放在一个Ref中并让工作者代理使用这个seque之外的项目。 seque从代码的角度返回看起来像常规seq的东西,队列魔法以透明的方式发生。请注意,如果您放入的序列被分块,那么它仍然会被强制一次。还要注意,对seque本身的初始调用似乎会阻塞,直到获得一个或两个初始项(或者一个块,视情况而定;我认为这与懒惰序列的工作方式有关,而不是{{ 1}}本身,但是)。

代码草图(真正粗略的,未经过测试):

seque

实际上你可能想要一个更复杂的队列项seq生产者,这样你就可以要求它停止生产新项目(如果整个事情能够优雅地关闭的话,这是必要的;未来将会死亡当任务源干涸时,使用(defn get-queue-items-seq [] (lazy-seq (cons (get-queue-item) (get-queue-items-seq)))) (def task-source (ref (seque (get-queue-items-seq)))) (defn do-stuff [] (let [worker (agent nil)] (if-let [result (dosync (when-let [task (first @task-source)] (send worker (fn [_] (do-stuff-with task)))))] (do (await worker) ;; maybe do something with worker's state (do-stuff))))) ;; continue working (defn do-lots-of-stuff [] (let [fs (doall (repeatedly 20 #(future (do-stuff))))] fs))) 查看他们是否已经这样做了。而这正是我第一眼就能看到的东西......我相信这里还有更多东西需要改进。不过,我认为一般方法都可行。

答案 3 :(得分:0)

不确定这是多么惯用,因为我还是该语言的新手,但以下解决方案对我有用:

(let [number-of-messages-per-time 2
      await-timeout 1000]
  (doseq [p-messages (partition number-of-messages-per-time messages)]
    (let [agents (map agent p-messages)]
      (doseq [a agents] (send-off a process))
      (apply await-for await-timeout agents)
      (map deref agents))))