应用错误收集

我正在运行Spark Streaming作业，并且一直在努力获得正确的吞吐量。以下是有关工作的一些细节

Batch window: 30 seconds
Processing 5 kafka topic with 20 partition each
Kafka per message size 100 to 300 bytes
10 executor 3GB memory each
A driver with 4GB nmemory
Backpressure is enabled
Expected throughput 4000 - 6000 messages/seconds
Spark job is writing data to flume and Axibase socket

我观察到每批的处理时间超过30秒（34,35 36秒或更多......）但是当我查看每批时，持续时间显示15到20秒。不确定导致此错过匹配的原因以及整个批处理时间高于批处理窗口。这些工作正在排队。

感谢您在此处确定问题的任何帮助。提前致谢

Spark批处理持续时间显示高于实际执行时间

0 个答案: