如何获得Spark Streaming延迟和吞吐量性能指标?

时间:2018-03-01 23:28:25

标签: apache-spark spark-streaming

我对Spark应用程序的基准感兴趣。 Spark应用程序特别是一个流应用程序。需要以下性能指标:

  • 火花工作的延迟
  • 吞吐量:每秒处理的#条记录
  • 将延迟写入HDFS存储

对于延迟,我不确定流媒体应用的理想指标是什么?可能是处理X GB数据的延迟?

我正在查看eventLog,我看到的事件如下所示:

27 {"Event":"SparkListenerTaskEnd","Stage ID":0,"Stage Attempt ID":0,"Task Type":"ShuffleMapTask","Task End Reason":{"Reason":"Success"},"Task Info":{"Task ID":0,"Index":0,"Attempt":0,"Launch Time":1519875848448,"Executor ID":"2","Host":"wn4-spark.qnzkwaodrnqefgaghyn41qkb1a.xx.internal.cloudapp.net","Locality":"RACK_LOCAL","Speculative":false,"Getting Result Time":0,"Finish Time":1519875850052,"Failed":false,"Killed":false,"Accumulables":[{"ID":1,"Name":"internal.metrics.executorDeserializeTime","Update":494,"Value":2143,"Internal":true,"Count Failed Values":true},{"ID":2,"Name":"internal.metrics.executorDeserializeCpuTime","Update":51432980,"Value":414880553,"Internal":true,"Count Failed Values":true},{"ID":3,"Name":"internal.metrics.executorRunTime","Update":1022,"Value":5065,"Internal":true,"Count Failed Values":true},{"ID":4,"Name":"internal.metrics.executorCpuTime","Update":133512654,"Value":650061315,"Internal":true,"Count Failed Values":true},{"ID":5,"Name":"internal.metrics.resultSize","Update":2004,"Value":10812,"Internal":true,"Count Failed Values":true},{"ID":6,"Name":"internal.metrics.jvmGCTime","Update":95,"Value":427,"Internal":true,"Count Failed Values":true},{"ID":8,"Name":"internal.metrics.memoryBytesSpilled","Update":0,"Value":0,"Internal":true,"Count Failed Values":true},{"ID":9,"Name":"internal.metrics.diskBytesSpilled","Update":0,"Value":0,"Internal":true,"Count Failed Values":true},{"ID":18,"Name":"internal.metrics.shuffle.write.bytesWritten","Update":1627,"Value":10395,"Internal":true,"Count Failed Values":true},{"ID":19,"Name":"internal.metrics.shuffle.write.recordsWritten","Update":12,"Value":59,"Internal":true,"Count Failed Values":true},{"ID":20,"Name":"internal.metrics.shuffle.write.writeTime","Update":11705346,"Value":54412673,"Internal":true,"Count Failed Values":true}]},"Task Metrics":{"Executor Deserialize Time":494,"Executor Deserialize CPU Time":51432980,"Executor Run Time":1022,"Executor CPU Time":133512654,"Result Size":2004,"JVM GC Time":95,"Result Serialization Time":0,"Memory Bytes Spilled":0,"Disk Bytes Spilled":0,"Shuffle Read Metrics":{"Remote Blocks Fetched":0,"Local Blocks Fetched":0,"Fetch Wait Time":0,"Remote Bytes Read":0,"Local Bytes Read":0,"Total Records Read":0},"Shuffle Write Metrics":{"Shuffle Bytes Written":1627,"Shuffle Write Time":11705346,"Shuffle Records Written":12},"Input Metrics":{"Bytes Read":0,"Records Read":0},"Output Metrics":{"Bytes Written":0,"Records Written":0},"Updated Blocks":[]}}

看起来我能够在上面的事件中获得Shuffle性能指标:

{"ID":18,"Name":"internal.metrics.shuffle.write.bytesWritten","Update":2086,"Value":12481,"Internal":true,"Count Failed Values":true},{"ID":19,"Name":"internal.metrics.shuffle.write.recordsWritten","Update":12,"Value":71,"Internal":true,"Count Failed Values":true},{"ID":20,"Name":"internal.metrics.shuffle.write.writeTime","Update":7177291,"Value":61589964,"Internal":true,"Count Failed Values":true}]}

如何获得上面需要的数字?

0 个答案:

没有答案