y玛瑙:无法在下一个任务中获得触发/发射结果

时间:2018-06-26 08:27:38

标签: clojure onyx-platform

我正在尝试开始使用Onyx,这是Clojure中的分布式计算平台。特别是,我尝试了解如何汇总数据。如果我正确地理解了文档,则可以结合使用window:trigger/emit function来做到这一点。

因此,我以三种方式修改了aggregation example(Onyx 0.13.0)(参见complete code的要点):

  • -main中,我println放置在输出通道上的所有段;这与原始代码一样可以正常工作,因为它将拾取所有段并将其打印到stdout。
  • 我添加了一个这样的发射函数:

    (defn make-ds
       [event window trigger {:keys [lower-bound upper-bound event-type] :as state-event} extent-state]
       (println "make-ds called")
       {:ds window})
    
  • 我添加了一个触发器配置(为简洁起见,发出了原始dump-words触发器):

    (def triggers
     [{:trigger/window-id :word-counter
       :trigger/id :make-ds
       :trigger/on :onyx.triggers/segment
       :trigger/fire-all-extents? true
       :trigger/threshold [5 :elements]
       :trigger/emit ::make-ds}])
    
  • 我将:count-words的任务从调用identity的类型更改为reduce的类型,这样它就不会将所有输入段都移交给输出了(并且添加了onyx应该批量解决的配置选项):

        {:onyx/name :count-words
         ;:onyx/fn :clojure.core/identity
         :onyx/type :reduce ; :function
         :onyx/group-by-key :word
         :onyx/flux-policy :kill
         :onyx/min-peers 1
         :onyx/max-peers 1
         :onyx/batch-size 1000
         :onyx/batch-fn? true}  
    

现在运行此命令,我可以在输出中看到为每个输入段调用了一次send函数(即make-ds)(第一个输出来自原始代码的dump-words触发器) :

     > lein run
     [....]
     Om -> 1
     name -> 1
     My -> 2
     a -> 1
     gone -> 1
     Coffee -> 1
     to -> 1
     get -> 1
     Time -> 1
     make-ds called
     make-ds called
     make-ds called
     make-ds called
     [....]

但是,从make-ds构建的段不会一直传递到输出通道,因此永远不会打印出来。如果我将:count-words任务恢复为identity函数,则可以正常工作。同样,看起来好像为每个输入段调用了send函数,而我希望仅在阈值条件为true时(即,每当窗口中聚合5个元素时)才调用它。

在Onyx代码库(onyx.windowing.emit-aggregate-test)中对此功能的测试通过得很好时,我想我在某个地方犯了一个愚蠢的错误,但我无所适从。 >

1 个答案:

答案 0 :(得分:0)

我终于看到日志文件onxy.log中有这样的警告:

[clojure.lang.ExceptionInfo: Windows cannot be checkpointed with ZooKeeper unless 
  :onyx.peer/storage.zk.insanely-allow-windowing? is set to true in the peer config.
  This should only be turned on as a development convenience.
[clojure.lang.ExceptionInfo: Handling uncaught exception thrown inside task 
  lifecycle :lifecycle/checkpoint-state. Killing the job. -> Exception type: 
  clojure.lang.ExceptionInfo. Exception message: Windows cannot be checkpointed with
  ZooKeeper unless :onyx.peer/storage.zk.insanely-allow-windowing? is set to true   in
  the peer config. This should only be turned on as a development convenience.  

设置好这一点后,我终于将一些片段移交给下一个任务。也就是说,我必须将对等配置更改为:

(def peer-config
  {:zookeeper/address "127.0.0.1:2189"
   :onyx/tenancy-id id
   :onyx.peer/job-scheduler :onyx.job-scheduler/balanced
   :onyx.peer/storage.zk.insanely-allow-windowing? true
   :onyx.messaging/impl :aeron
   :onyx.messaging/peer-port 40200
   :onyx.messaging/bind-addr "localhost"})

现在,:onyx.peer/storage.zk.insanely-allow-windowing?听起来并不是一件好事。 Lucas Bradstreet建议在Clojurians Slack频道上切换到S3检查点。