我正在尝试开始使用Onyx,这是Clojure中的分布式计算平台。特别是,我尝试了解如何汇总数据。如果我正确地理解了文档,则可以结合使用window和:trigger/emit
function来做到这一点。
因此,我以三种方式修改了aggregation example(Onyx 0.13.0)(参见complete code的要点):
-main
中,我println
放置在输出通道上的所有段;这与原始代码一样可以正常工作,因为它将拾取所有段并将其打印到stdout。我添加了一个这样的发射函数:
(defn make-ds
[event window trigger {:keys [lower-bound upper-bound event-type] :as state-event} extent-state]
(println "make-ds called")
{:ds window})
我添加了一个触发器配置(为简洁起见,发出了原始dump-words
触发器):
(def triggers
[{:trigger/window-id :word-counter
:trigger/id :make-ds
:trigger/on :onyx.triggers/segment
:trigger/fire-all-extents? true
:trigger/threshold [5 :elements]
:trigger/emit ::make-ds}])
我将:count-words
的任务从调用identity
的类型更改为reduce
的类型,这样它就不会将所有输入段都移交给输出了(并且添加了onyx应该批量解决的配置选项):
{:onyx/name :count-words
;:onyx/fn :clojure.core/identity
:onyx/type :reduce ; :function
:onyx/group-by-key :word
:onyx/flux-policy :kill
:onyx/min-peers 1
:onyx/max-peers 1
:onyx/batch-size 1000
:onyx/batch-fn? true}
现在运行此命令,我可以在输出中看到为每个输入段调用了一次send函数(即make-ds
)(第一个输出来自原始代码的dump-words
触发器) :
> lein run
[....]
Om -> 1
name -> 1
My -> 2
a -> 1
gone -> 1
Coffee -> 1
to -> 1
get -> 1
Time -> 1
make-ds called
make-ds called
make-ds called
make-ds called
[....]
但是,从make-ds构建的段不会一直传递到输出通道,因此永远不会打印出来。如果我将:count-words
任务恢复为identity
函数,则可以正常工作。同样,看起来好像为每个输入段调用了send函数,而我希望仅在阈值条件为true时(即,每当窗口中聚合5个元素时)才调用它。
在Onyx代码库(onyx.windowing.emit-aggregate-test
)中对此功能的测试通过得很好时,我想我在某个地方犯了一个愚蠢的错误,但我无所适从。 >
答案 0 :(得分:0)
我终于看到日志文件onxy.log
中有这样的警告:
[clojure.lang.ExceptionInfo: Windows cannot be checkpointed with ZooKeeper unless
:onyx.peer/storage.zk.insanely-allow-windowing? is set to true in the peer config.
This should only be turned on as a development convenience.
[clojure.lang.ExceptionInfo: Handling uncaught exception thrown inside task
lifecycle :lifecycle/checkpoint-state. Killing the job. -> Exception type:
clojure.lang.ExceptionInfo. Exception message: Windows cannot be checkpointed with
ZooKeeper unless :onyx.peer/storage.zk.insanely-allow-windowing? is set to true in
the peer config. This should only be turned on as a development convenience.
设置好这一点后,我终于将一些片段移交给下一个任务。也就是说,我必须将对等配置更改为:
(def peer-config
{:zookeeper/address "127.0.0.1:2189"
:onyx/tenancy-id id
:onyx.peer/job-scheduler :onyx.job-scheduler/balanced
:onyx.peer/storage.zk.insanely-allow-windowing? true
:onyx.messaging/impl :aeron
:onyx.messaging/peer-port 40200
:onyx.messaging/bind-addr "localhost"})
现在,:onyx.peer/storage.zk.insanely-allow-windowing?
听起来并不是一件好事。 Lucas Bradstreet建议在Clojurians Slack频道上切换到S3检查点。