我有一个输入流,我希望在将结果传递给程序的另一部分之前为每个输入生成2 HTTPS
个网络请求。典型的吞吐量是每秒50个。
for each input:
HTTP request A
HTTP request B
pass event on with (A.body and B.body)
我正在使用http-kit
客户端,默认情况下是异步的。它返回一个promise,也可以进行回调。 Http-kit使用Java NIO(参见here和here)
请求的速度与发出请求的时间相结合,足够高,需要异步完成。
我尝试了3种方法:
go
例程正在关闭频道。每个人都会提出阻止'阻止'通过deref
HTTP请求的承诺来获取goblock。这不起作用,因为我不认为这个承诺可以很好地与线程一起使用。future
,这会阻止'阻止'在异步承诺上。这导致非常高CPU使用率。再加上网络资源的饥饿。http-kit
请求,传入一个回调,该回调产生请求B,传递一个传递事件的回调。这会在几个小时后导致内存不足。这些都可以在一段时间内处理容量。他们最终都崩溃了。大约12个小时后最近的崩溃:
Mar 10, 2016 2:05:59 AM com.mchange.v2.async.ThreadPoolAsynchronousRunner$DeadlockDetector run
WARNING: com.mchange.v2.async.ThreadPoolAsynchronousRunner$DeadlockDetector@1bc8a7f5 -- APPARENT DEADLOCK!!! Creating emergency threads for unassigned pending
tasks!
Mar 10, 2016 3:38:38 AM com.mchange.v2.async.ThreadPoolAsynchronousRunner$DeadlockDetector run
WARNING: com.mchange.v2.async.ThreadPoolAsynchronousRunner$DeadlockDetector@1bc8a7f5 -- APPARENT DEADLOCK!!! Complete Status:
Managed Threads: 3
Active Threads: 1
Active Tasks:
com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@65d8b232 (com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#0)
Pending Tasks:
com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask@359acb0d
Pool thread stack traces:
Thread[com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#0,5,main]
com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:560)
Thread[com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#1,5,main]
java.lang.Object.wait(Native Method)
com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:534)
Thread[com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#2,5,main]
java.lang.Object.wait(Native Method)
com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:534)
Thu Mar 10 04:38:34 UTC 2016 [client-loop] ERROR - select exception, should not happen
java.lang.OutOfMemoryError: Java heap space
at java.io.ByteArrayOutputStream.<init>(ByteArrayOutputStream.java:77)
at sun.security.ssl.OutputRecord.<init>(OutputRecord.java:76)
at sun.security.ssl.EngineOutputRecord.<init>(EngineOutputRecord.java:65)
at sun.security.ssl.HandshakeOutStream.<init>(HandshakeOutStream.java:63)
at sun.security.ssl.Handshaker.activate(Handshaker.java:514)
at sun.security.ssl.SSLEngineImpl.kickstartHandshake(SSLEngineImpl.java:717)
at sun.security.ssl.SSLEngineImpl.beginHandshake(SSLEngineImpl.java:743)
at org.httpkit.client.HttpClient.finishConnect(HttpClient.java:310)
at org.httpkit.client.HttpClient.run(HttpClient.java:375)
at java.lang.Thread.run(Thread.java:745)
Mar 10, 2016 4:56:34 AM baleen.events invoke
SEVERE: Thread error: Java heap space
java.lang.OutOfMemoryError: Java heap space
Mar 10, 2016 5:00:43 AM baleen.events invoke
SEVERE: Thread error: Java heap space
java.lang.OutOfMemoryError: Java heap space
Mar 10, 2016 4:58:25 AM baleen.events invoke
SEVERE: Thread error: Java heap space
java.lang.OutOfMemoryError: Java heap space
我不知道失败的原因是什么。可能是因为有太多关闭,或逐渐的资源泄漏,或线程饥饿。
问题
每秒发出50个HTTP请求,每个请求可能需要200毫秒,这意味着在任何给定时间可能有100个请求在飞行中,听起来像是一个过重的负担?
如何以处理吞吐量的方式执行此操作并且功能强大?
修改
YourKit探查器通过char[]
通过org.httpkit.client.Handler
告诉我,我有大约2GB的java.util.concurrent.FutureTask
s,这表明以某种方式保留了对旧处理程序(即请求)的引用。尝试使用回调的全部原因是为了避免这种情况(尽管它们可能会以某种方式陷入闭包)
答案 0 :(得分:1)
- 每秒发出50个HTTP请求,每个请求可能需要200毫秒,这意味着在任何给定时间可能有100个请求在飞行中,听起来像是一个过重的负担?
醇>
在现代硬件上,这绝对不是 。
- 如何以处理吞吐量的方式执行此操作并且功能强大?
醇>
您可以结合使用core.async管道和http-kit的回调来实现此目的。您并不需要为每个请求创建go
例程(尽管这不应该受到影响),因为您可以使用http-kit回调中的异步put!
。
为管道的每个步骤使用有界缓冲区来限制活动连接的数量,这将(至少)受到系统上可用的临时TCP端口数量的限制。
这是一个小程序的示例,它执行与您描述的类似的操作。它从通道读取“事件” - 在这种情况下,每个事件都是ID“1” - 并在HTTP服务上查找这些ID。它接受来自第一次调用的响应,查找JSON键"next"
并将其排入第2步的URL。最后,当此查找完成时,它会向out
通道添加一个事件, go
例行监控报告统计数据。
(ns concur-req.core
(require [clojure.core.async :as async]
[cheshire.core :refer [decode]]
[org.httpkit.client :as http]))
(defn url-of
[id]
;; this service responds within 100-200ms
(str "http://localhost:28080/" id ".json"))
(defn retrieve-json-async
[url c]
(http/get url nil
(fn [{body :body status :status :as resp}]
(if (= 200 status)
(async/put! c (decode body true))
(println "ERROR:" resp))
(async/close! c))))
(defn run [parallelism stop-chan]
(let [;; allocate half of the parallelism to each step
step1-n (int (max (/ parallelism 2) 1))
step2-n step1-n
;; buffer to take ids, transform them into urls
step1-chan (async/chan step1-n (map url-of))
;; buffer for result of pulling urls from step1, xform by extracting :next url
step2-chan (async/chan step2-n (map :next))
;; buffer to count completed results
out-chan (async/chan 1 (map (constantly 1)))
;; for delivering the final result
final-chan (async/chan)
start-time (System/currentTimeMillis)]
;; process URLs from step1 and put the result in step2
(async/pipeline-async step1-n step2-chan retrieve-json-async step1-chan)
;; process URLs from step2 and put the result in out
(async/pipeline-async step2-n out-chan retrieve-json-async step2-chan)
;; keep the input channel full until stop-chan is closed.
(async/go-loop []
(let [[v c] (async/alts! [stop-chan [step1-chan "1"]])]
(if (= c stop-chan)
(async/close! step1-chan)
(recur))))
;; count messages on out-chan until the pipeline is closed, printing
;; status message every second
(async/go-loop [status-timer (async/timeout 1000) subt 0 accu 0]
(let [[v c] (async/alts! [status-timer out-chan])]
(cond (= c status-timer)
(do (println subt "records...")
(recur (async/timeout 1000) 0 (+ subt accu)))
(nil? v)
(async/>! final-chan (+ subt accu))
:else
(recur status-timer (+ v subt) accu))))
;; block until done, then emit final report.
(let [final-total (async/<!! final-chan)
elapsed-ms (- (System/currentTimeMillis) start-time)
elapsed-s (/ elapsed-ms 1000.0)]
(print (format "Processed %d records with parallelism %d in %.3f seconds (%d/sec)\n"
final-total parallelism elapsed-s
(int (/ final-total elapsed-s)))))))
(defn run-for
[seconds parallelism]
(let [stop-chan (async/chan)]
(future
(Thread/sleep (* seconds 1000))
(async/close! stop-chan))
(run parallelism stop-chan)))
(do
;; Warm up the connection pool, avoid somaxconn problems...
(doseq [p (map #(* 20 (inc %)) (range 25))]
(run-for 1 p))
(run-for (* 60 60 6) 500))
为了对此进行测试,我设置了一个HTTP服务,该服务仅在睡眠100-200ms之间的随机时间后才响应。然后我在我的Macbook Pro上运行了这个程序6个小时。
将并行性设置为500,我平均每秒完成1155个事务(每秒2310个完成的HTTP请求)。我确信通过一些调整(特别是通过将HTTP服务移动到不同的机器上)可能会更高。 JVM内存在前30分钟内爬升到1.5 GB,然后保持这个大小。我正在使用Oracle的64位1.8 JVM。
答案 1 :(得分:0)
您的方法A的替代方法(deref
HTTP-kit返回go-block内的未来可能是一种可能性,只是这样做不会导致阻塞core.async处理程序关于未来的线程,您可以通过组合httpkit的回调和core.async:
(defn handle-event
"Return a core.async channel that will contain the result of making both HTTP call A and B."
[event-data]
(let [event-a-chan (clojure.core.async/chan)
event-b-chan (clojure.core.async/chan)
return-chan (clojure.core.async/chan)]
(org.httpkit.client/request "https://event-a-call"
{:method :get :params {"param1-k" "param1-v"}}
(fn [resp]
(clojure.core.async/put! event-a-chan resp)))
(org.httpkit.client/request "https://event-b-call"
{:method :get :params {"param1-k" "param1-v"}}
(fn [resp]
(clojure.core.async/put! event-b-chan resp)))
(clojure.core.async/go
(clojure.core.async/>! return-chan {:event-a-response (clojure.core.async/<! event-a-chan)
:event-b-response (clojure.core.async/<! event-b-chan)}))
return-chan))