使用print重新格式化spark-streaming dstream计数

时间:2017-07-31 14:19:01

标签: scala spark-streaming

我使用此行打印出RDD计数消息:

myDStream.count.print

我得到类似的东西:

-------------------------------------------
Time: 1501499254000 ms
-------------------------------------------
2

-------------------------------------------
Time: 1501499256000 ms
-------------------------------------------
0

-------------------------------------------
Time: 1501499258000 ms
-------------------------------------------
0

我只想像这样重新格式化这条消息:

-------------------------------------------
Time: 1501499254000 ms
-------------------------------------------
log.info Got new batch with 2 messages

-------------------------------------------
Time: 1501499256000 ms
-------------------------------------------
log.info Got new batch with 0 messages

-------------------------------------------
Time: 1501499258000 ms
-------------------------------------------
log.info Got new batch with 0 messages

你有什么想法吗?

1 个答案:

答案 0 :(得分:2)

implementation of print已修复。如果我们想要一个不同的输出,我们需要推出自己的实现:

dstream.foreachRDD{(rdd, time) =>
    val count = rdd.count()
    println("-------------------------------------------")
    println(s"Time: $time")
    println("-------------------------------------------")
    println(s"log.info Got new batch with $count messages")
}