我们有一个火花流应用程序,每个时间间隔它都会读取一个蜂巢表并做一些聚合,通常它非常快,可以在20秒内完成。然而,运行几个小时后,一个执行器将非常慢,而其他执行器正常运行速度快,我检查慢速服务器:磁盘,网络没有问题,并且在同一台服务器上还有其他应用程序的其他执行程序,它们都很好。
慢执行程序的CPU使用率是100%,我得到慢执行程序的threaddump,它正在读取HDFS,堆栈如下。数据大小读数为128M,应该在几秒钟内完成,但在这个慢速执行器中需要90秒以上。
有什么想法吗?提前谢谢。
2017-06-15 15:04:04
"Executor task launch worker-172" daemon prio=10 tid=0x00007fb308001000 nid=0x3966 runnable [0x00007fb338a4c000]
java.lang.Thread.State: RUNNABLE
at sun.nio.cs.UTF_8$Decoder.decodeArrayLoop(UTF_8.java:211)
at sun.nio.cs.UTF_8$Decoder.decodeLoop(UTF_8.java:354)
at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:561)
at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:783)
at org.apache.hadoop.io.Text.decode(Text.java:405)
at org.apache.hadoop.io.Text.decode(Text.java:382)
at org.apache.hadoop.io.Text.toString(Text.java:280)
at org.apache.spark.sql.hive.HiveInspectors$class.unwrap(HiveInspectors.scala:322)
at org.apache.spark.sql.hive.HadoopTableReader$.unwrap(TableReader.scala:315)
at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$14.apply(TableReader.scala:401)
at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$14.apply(TableReader.scala:401)
at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:416)
at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:408)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389)
at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:511)
at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.<init>(TungstenAggregationIterator.scala:686)
2017-06-15 15:04:07
"Executor task launch worker-172" daemon prio=10 tid=0x00007fb308001000 nid=0x3966 runnable [0x00007fb338a4c000]
java.lang.Thread.State: RUNNABLE
at sun.misc.Unsafe.getByte(Native Method)
at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:249)
at java.nio.ByteBuffer.get(ByteBuffer.java:673)
at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:279)
at java.nio.ByteBuffer.get(ByteBuffer.java:694)
at org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:304)
at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:203)
at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:138)
- locked <0x00000007bed42868> (a org.apache.hadoop.hdfs.RemoteBlockReader2)
at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:683)
at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:739)
- locked <0x00000007bed42808> (a org.apache.hadoop.hdfs.DFSInputStream)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:796)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:837)
- locked <0x00000007bed42808> (a org.apache.hadoop.hdfs.DFSInputStream)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:209)
- locked <0x00000007bed41c58> (a org.apache.hadoop.mapred.LineRecordReader)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:47)
at org.apache.spark.rdd.HadoopRDD$anon$1.getNext(HadoopRDD.scala:249)
at org.apache.spark.rdd.HadoopRDD$anon$1.getNext(HadoopRDD.scala:211)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$anon$14.hasNext(Iterator.scala:388)
at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:511)
at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.<init>(TungstenAggregationIterator.scala:686)
at org.apache.spark.sql.execution.aggregate.TungstenAggregate$anonfun$doExecute$1$anonfun$2.apply(TungstenAggregate.scala:95)