很抱歉,如果这个问题看起来无效,我试图找到调试任务处理时间的一般指导,但还没有找到。我认为我的问题是一个已知问题,所以任何帮助调试问题或理解问题(相关讨论或博客文章)都会回答我的问题。
我制作了多个流式火花工作,几乎所有人都遇到同样的问题;每个执行程序中的一个任务比其他所有任务花费更长的时间:
我的工作流程是直接Kafka流源平面映射(mapParitionsWithPair ( flatMap )
),有四十个分区,可以从事件中生成更多对象,然后减少它们(reduceByKey
)并将聚合值保存到某个DB:
任务时间表数字用于减少阶段。
这是一个基于Apache Mesos的集群,每个节点有两个节点和两个核心,所有作业的第二阶段都有不均匀的任务处理时间分布。
更新:
reduceByKey
,但仍然出现同样的问题。更新:
我通过在每个分区中使用Java 8 Stream reduce来改变工作流程以避免数据混乱。这是改变了工作的DAG:
我将批处理间隔增加到20秒并添加了更多节点;现在,不仅有一个缓慢的任务,而且任务更慢,速度更快,但是:
只是在每个分区中放置一些记录实际操作的内容,我看到奇怪的是,有时候任务很慢,有时很快。当任务进行缓慢时,CPU处于空闲状态,我看不到任何网络或CPU I / O阻塞。内存使用率恒定为%50。这里提到执行程序日志:
started processing partitioned input: thread 99
started processing partitioned input: thread 98
finished processing partitioned input: thread 99 took 40615ms
finished processing partitioned input: thread 98 took 40469ms
started processing partitioned input: thread 98
started processing partitioned input: thread 99
finished processing partitioned input: thread 98 took 40476ms
finished processing partitioned input: thread 99 took 40523ms
started processing partitioned input: thread 98
started processing partitioned input: thread 99
finished processing partitioned input: thread 98 40465ms
finished processing partitioned input: thread 99 40379ms
started processing partitioned input: thread 98
finished processing partitioned input: thread 98 468
started processing partitioned input: thread 99
finished processing partitioned input: thread 99 525
started processing partitioned input: thread 99
started processing partitioned input: thread 98
finished processing partitioned input: thread 98 738
finished processing partitioned input: thread 99 790
started processing partitioned input: thread 98
finished processing partitioned input: thread 98 took 558
started processing partitioned input: thread 99
finished processing partitioned input: thread 99 took 461
started processing partitioned input: thread 98
finished processing partitioned input: thread 98 took 483
started processing partitioned input: thread 99
finished processing partitioned input: thread 99 took 513
started processing partitioned input: thread 98
finished processing partitioned input: thread 98 took 485
started processing partitioned input: thread 99
finished processing partitioned input: thread 99 took 454
以上日志仅用于将一些传入输入映射到对象以便在Cassandra中保存,并且不包括保存到Cassandra的时间;这是保存操作的日志,它总是很快并且不会让CPU空闲:
18/02/07 07:41:47 INFO Executor: Running task 17.0 in stage 5.0 (TID 207)
18/02/07 07:41:47 INFO TorrentBroadcast: Started reading broadcast variable 5
18/02/07 07:41:47 INFO MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 7.8 KB, free 1177.1 MB)
18/02/07 07:41:47 INFO TorrentBroadcast: Reading broadcast variable 5 took 33 ms
18/02/07 07:41:47 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 16.4 KB, free 1177.1 MB)
18/02/07 07:41:47 INFO BlockManager: Found block rdd_30_2 locally
18/02/07 07:41:47 INFO BlockManager: Found block rdd_30_17 locally
18/02/07 07:42:02 INFO TableWriter: Wrote 28926 rows to keyspace.table in 15.749 s.
18/02/07 07:42:02 INFO Executor: Finished task 17.0 in stage 5.0 (TID 207). 923 bytes result sent to driver
18/02/07 07:42:02 INFO CoarseGrainedExecutorBackend: Got assigned task 209
18/02/07 07:42:02 INFO Executor: Running task 18.0 in stage 5.0 (TID 209)
18/02/07 07:42:02 INFO BlockManager: Found block rdd_30_18 locally
18/02/07 07:42:03 INFO TableWriter: Wrote 29288 rows to keyspace.table in 16.042 s.
18/02/07 07:42:03 INFO Executor: Finished task 2.0 in stage 5.0 (TID 203). 1713 bytes result sent to driver
18/02/07 07:42:03 INFO CoarseGrainedExecutorBackend: Got assigned task 211
18/02/07 07:42:03 INFO Executor: Running task 21.0 in stage 5.0 (TID 211)
18/02/07 07:42:03 INFO BlockManager: Found block rdd_30_21 locally
18/02/07 07:42:19 INFO TableWriter: Wrote 29315 rows to keyspace.table in 16.308 s.
18/02/07 07:42:19 INFO Executor: Finished task 21.0 in stage 5.0 (TID 211). 923 bytes result sent to driver
18/02/07 07:42:19 INFO CoarseGrainedExecutorBackend: Got assigned task 217
18/02/07 07:42:19 INFO Executor: Running task 24.0 in stage 5.0 (TID 217)
18/02/07 07:42:19 INFO BlockManager: Found block rdd_30_24 locally
18/02/07 07:42:19 INFO TableWriter: Wrote 29422 rows to keyspace.table in 16.783 s.
18/02/07 07:42:19 INFO Executor: Finished task 18.0 in stage 5.0 (TID 209). 923 bytes result sent to driver
18/02/07 07:42:19 INFO CoarseGrainedExecutorBackend: Got assigned task 218
18/02/07 07:42:19 INFO Executor: Running task 25.0 in stage 5.0 (TID 218)
18/02/07 07:42:19 INFO BlockManager: Found block rdd_30_25 locally
18/02/07 07:42:35 INFO TableWriter: Wrote 29427 rows to keyspace.table in 16.509 s.
18/02/07 07:42:35 INFO Executor: Finished task 24.0 in stage 5.0 (TID 217). 923 bytes result sent to driver
18/02/07 07:42:35 INFO CoarseGrainedExecutorBackend: Got assigned task 225