我的项目中有以下场景;
示例:
10:00:00计算和输出
10:00:00 key = 1,mean = 10,variance = 2
10:00:00 key = N,mean = 10,variance = 2
10:00:01计算和输出
10:00:00 key = 1,mean = 11,variance = 2
10:00:00 key = N,mean = 12,variance = 2
10:00:01数据与10:00:00没有依赖关系
如何在单个java spark流应用程序中设置此类作业。
答案 0 :(得分:0)
有两个选项: -
首先 - 您可以将批处理持续时间设置为60秒,然后从流中接收的数据可用于计算平均值和方差。
第二次 - 您可以利用DStream.window
功能,例如看下面的伪代码: -
val streamCtx = new StreamingContext(sparkCtx, Seconds(10))
//Assuming that you are using Flume and creating a Polling Stream
val flumeStream = FlumeUtils.createPollingStream(streamCtx, <Array[InetSocketAddress]>, <StorageLevel>, 1000, 1)
val windowStream = flumeStream.window(Seconds(60), Seconds(60))
//Now here process windowStream and calculate means and variance of each Key.
有关详细信息,请参阅DStream API