如何在Clojure(/ Java)中强大地生成大量并发HTTPS请求

时间:2016-03-10 10:44:00

标签: networking asynchronous clojure core.async http-kit

我有一个输入流,我希望在将结果传递给程序的另一部分之前为每个输入生成2 HTTPS个网络请求。典型的吞吐量是每秒50个。

for each input:
    HTTP request A
    HTTP request B
    pass event on with (A.body and B.body)

我正在使用http-kit客户端,默认情况下是异步的。它返回一个promise,也可以进行回调。 Http-kit使用Java NIO(参见herehere

请求的速度与发出请求的时间相结合,足够高,需要异步完成。

我尝试了3种方法:

  1. 当一个事件进入时,将其放在一个频道上。许多go例程正在关闭频道。每个人都会提出阻止'阻止'通过deref HTTP请求的承诺来获取goblock。这不起作用,因为我不认为这个承诺可以很好地与线程一起使用。
  2. 当一个事件进入时,立即开始future,这会阻止'阻止'在异步承诺上。这导致非常高CPU使用率。再加上网络资源的饥饿。
  3. 当一个事件进入时,立即为请求A触发http-kit请求,传入一个回调,该回调产生请求B,传递一个传递事件的回调。这会在几个小时后导致内存不足。
  4. 这些都可以在一段时间内处理容量。他们最终都崩溃了。大约12个小时后最近的崩溃:

    Mar 10, 2016 2:05:59 AM com.mchange.v2.async.ThreadPoolAsynchronousRunner$DeadlockDetector run
    WARNING: com.mchange.v2.async.ThreadPoolAsynchronousRunner$DeadlockDetector@1bc8a7f5 -- APPARENT DEADLOCK!!! Creating emergency threads for unassigned pending
     tasks!
    Mar 10, 2016 3:38:38 AM com.mchange.v2.async.ThreadPoolAsynchronousRunner$DeadlockDetector run
    WARNING: com.mchange.v2.async.ThreadPoolAsynchronousRunner$DeadlockDetector@1bc8a7f5 -- APPARENT DEADLOCK!!! Complete Status:
            Managed Threads: 3
            Active Threads: 1
            Active Tasks:
                    com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@65d8b232 (com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#0)
            Pending Tasks:
                    com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask@359acb0d
    Pool thread stack traces:
            Thread[com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#0,5,main]
                    com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:560)
            Thread[com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#1,5,main]
                    java.lang.Object.wait(Native Method)
                    com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:534)
            Thread[com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#2,5,main]
                    java.lang.Object.wait(Native Method)
                    com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:534)
    
    
    Thu Mar 10 04:38:34 UTC 2016 [client-loop] ERROR - select exception, should not happen
    java.lang.OutOfMemoryError: Java heap space
            at java.io.ByteArrayOutputStream.<init>(ByteArrayOutputStream.java:77)
            at sun.security.ssl.OutputRecord.<init>(OutputRecord.java:76)
            at sun.security.ssl.EngineOutputRecord.<init>(EngineOutputRecord.java:65)
            at sun.security.ssl.HandshakeOutStream.<init>(HandshakeOutStream.java:63)
            at sun.security.ssl.Handshaker.activate(Handshaker.java:514)
            at sun.security.ssl.SSLEngineImpl.kickstartHandshake(SSLEngineImpl.java:717)
            at sun.security.ssl.SSLEngineImpl.beginHandshake(SSLEngineImpl.java:743)
            at org.httpkit.client.HttpClient.finishConnect(HttpClient.java:310)
            at org.httpkit.client.HttpClient.run(HttpClient.java:375)
            at java.lang.Thread.run(Thread.java:745)
    Mar 10, 2016 4:56:34 AM baleen.events invoke
    SEVERE: Thread error: Java heap space
    java.lang.OutOfMemoryError: Java heap space
    Mar 10, 2016 5:00:43 AM baleen.events invoke
    SEVERE: Thread error: Java heap space
    java.lang.OutOfMemoryError: Java heap space
    Mar 10, 2016 4:58:25 AM baleen.events invoke
    SEVERE: Thread error: Java heap space
    java.lang.OutOfMemoryError: Java heap space
    

    我不知道失败的原因是什么。可能是因为有太多关闭,或逐渐的资源泄漏,或线程饥饿。

    问题

    1. 每秒发出50个HTTP请求,每个请求可能需要200毫秒,这意味着在任何给定时间可能有100个请求在飞行中,听起来像是一个过重的负担?

    2. 如何以处理吞吐量的方式执行此操作并且功能强大?

    3. 修改

      YourKit探查器通过char[]通过org.httpkit.client.Handler告诉我,我有大约2GB的java.util.concurrent.FutureTask s,这表明以某种方式保留了对旧处理程序(即请求)的引用。尝试使用回调的全部原因是为了避免这种情况(尽管它们可能会以某种方式陷入闭包)

2 个答案:

答案 0 :(得分:1)

  
      
  1. 每秒发出50个HTTP请求,每个请求可能需要200毫秒,这意味着在任何给定时间可能有100个请求在飞行中,听起来像是一个过重的负担?
  2.   

在现代硬件上,这绝对不是

  
      
  1. 如何以处理吞吐量的方式执行此操作并且功能强大?
  2.   

您可以结合使用core.async管道和http-kit的回调来实现此目的。您并不需要为每个请求创建go例程(尽管这不应该受到影响),因为您可以使用http-kit回调中的异步put!

为管道的每个步骤使用有界缓冲区来限制活动连接的数量,这将(至少)受到系统上可用的临时TCP端口数量的限制。

这是一个小程序的示例,它执行与您描述的类似的操作。它从通道读取“事件” - 在这种情况下,每个事件都是ID“1” - 并在HTTP服务上查找这些ID。它接受来自第一次调用的响应,查找JSON键"next"并将其排入第2步的URL。最后,当此查找完成时,它会向out通道添加一个事件, go例行监控报告统计数据。

(ns concur-req.core
  (require [clojure.core.async :as async]
           [cheshire.core :refer [decode]]
           [org.httpkit.client :as http]))

(defn url-of
  [id]
  ;; this service responds within 100-200ms
  (str "http://localhost:28080/" id ".json"))

(defn retrieve-json-async
  [url c]
  (http/get url nil
            (fn [{body :body status :status :as resp}]
              (if (= 200 status)
                (async/put! c (decode body true))
                (println "ERROR:" resp))
              (async/close! c))))

(defn run [parallelism stop-chan]
  (let [;; allocate half of the parallelism to each step
        step1-n    (int (max (/ parallelism 2) 1))
        step2-n    step1-n
        ;; buffer to take ids, transform them into urls
        step1-chan (async/chan step1-n (map url-of))
        ;; buffer for result of pulling urls from step1, xform by extracting :next url
        step2-chan (async/chan step2-n (map :next))
        ;; buffer to count completed results
        out-chan   (async/chan 1 (map (constantly 1)))
        ;; for delivering the final result
        final-chan (async/chan)
        start-time (System/currentTimeMillis)]

    ;; process URLs from step1 and put the result in step2
    (async/pipeline-async step1-n step2-chan retrieve-json-async step1-chan)
    ;; process URLs from step2 and put the result in out
    (async/pipeline-async step2-n out-chan retrieve-json-async step2-chan)

    ;; keep the input channel full until stop-chan is closed.
    (async/go-loop []
      (let [[v c] (async/alts! [stop-chan [step1-chan "1"]])]
        (if (= c stop-chan)
          (async/close! step1-chan)
          (recur))))

    ;; count messages on out-chan until the pipeline is closed, printing
    ;; status message every second
    (async/go-loop [status-timer (async/timeout 1000) subt 0 accu 0]
      (let [[v c] (async/alts! [status-timer out-chan])]
        (cond (= c status-timer)
              (do (println subt "records...")
                  (recur (async/timeout 1000) 0 (+ subt accu)))

              (nil? v)
              (async/>! final-chan (+ subt accu))

              :else
              (recur status-timer (+ v subt) accu))))

    ;; block until done, then emit final report.
    (let [final-total (async/<!! final-chan)
          elapsed-ms  (- (System/currentTimeMillis) start-time)
          elapsed-s   (/ elapsed-ms 1000.0)]
      (print (format "Processed %d records with parallelism %d in %.3f seconds (%d/sec)\n"
                     final-total parallelism elapsed-s
                     (int (/ final-total elapsed-s)))))))

(defn run-for
  [seconds parallelism]
  (let [stop-chan (async/chan)]
    (future
      (Thread/sleep (* seconds 1000))
      (async/close! stop-chan))
    (run parallelism stop-chan)))

(do
  ;; Warm up the connection pool, avoid somaxconn problems...
  (doseq [p (map #(* 20 (inc %)) (range 25))]
    (run-for 1 p))
  (run-for (* 60 60 6) 500))

为了对此进行测试,我设置了一个HTTP服务,该服务仅在睡眠100-200ms之间的随机时间后才响应。然后我在我的Macbook Pro上运行了这个程序6个小时。

将并行性设置为500,我平均每秒完成1155个事务(每秒2310个完成的HTTP请求)。我确信通过一些调整(特别是通过将HTTP服务移动到不同的机器上)可能会更高。 JVM内存在前30分钟内爬升到1.5 GB,然后保持这个大小。我正在使用Oracle的64位1.8 JVM。

答案 1 :(得分:0)

您的方法A的替代方法(deref HTTP-kit返回go-block内的未来可能是一种可能性,只是这样做不会导致阻塞core.async处理程序关于未来的线程,您可以通过组合httpkit的回调和core.async:

来完成
(defn handle-event
 "Return a core.async channel that will contain the result of making both HTTP call A and B."
  [event-data]
  (let [event-a-chan (clojure.core.async/chan)
        event-b-chan (clojure.core.async/chan)
        return-chan (clojure.core.async/chan)]
    (org.httpkit.client/request "https://event-a-call"
                                {:method :get :params {"param1-k" "param1-v"}}
                                (fn [resp]
                                  (clojure.core.async/put! event-a-chan resp)))
    (org.httpkit.client/request "https://event-b-call"
                                {:method :get :params {"param1-k" "param1-v"}}
                                (fn [resp]
                                  (clojure.core.async/put! event-b-chan resp)))
    (clojure.core.async/go
      (clojure.core.async/>! return-chan {:event-a-response (clojure.core.async/<! event-a-chan)
                                          :event-b-response (clojure.core.async/<! event-b-chan)}))
    return-chan))