What 's difference for the latency between Storm and Spark Streaming when dealing with tuples\messages?

时间:2017-05-29 01:54:14

标签: apache-spark apache-storm

1,Based on the description below, Both Storm and Spark Streaming dealing with the messages/tuples in batch or small/micro batch? https://storm.apache.org/releases/2.0.0-SNAPSHOT/Trident-tutorial.html

2,If the answer for the above question is yes, it means both technologies have the delay when dealing with the messages/tuples ? If that's the case why I heard often that latency for the Storm is better then Spark Streaming ,such as the below article? https://www.ericsson.com/research-blog/data-knowledge/apache-storm-vs-spark-streaming/

3,From the Trident-tutorial it describes that : "Generally the size of those small batches will be on the order of thousands or millions of tuples, depending on your incoming throughput." So what's the really size of the small batch? thousands or millions of tuples?If it is , how Storm can keep the short latency?

https://storm.apache.org/releases/2.0.0-SNAPSHOT/Trident-tutorial.html

1 个答案:

答案 0 :(得分:1)

Storm的核心API会在事件到来时尝试处理。它是一个时间处理模型的事件,可以导致非常低的延迟。

Storm's Trident是一款微型配音模型,建立在风暴核心apis之上,提供一次性保证。 Spark流式传输也基于微批处理,并且在延迟方面与三叉戟相当。

因此,如果一个人正在寻找极低延迟的处理,那么Storm的核心api将是最佳选择。但是,这仅保证至少一次处理,并且在发生故障时有可能接收重复事件,并且应用程序应该处理此事。

看看来自雅虎[1]的流媒体基准测试,可以提供更多见解。

[1] https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at