什么时候pyspark会失败" java.lang.AssertionError:断言失败"来自BlockInfo.checkInvariants?

时间:2017-12-03 12:10:48

标签: apache-spark pyspark

我正在使用pyspark并收到以下消息:

17/12/03 11:57:48 WARN TaskSetManager: Lost task 0.0 in stage 5.0 (TID 1800, 172.31.27.9, executor 0): java.lang.AssertionError: assertion failed
at scala.Predef$.assert(Predef.scala:156)
at org.apache.spark.storage.BlockInfo.checkInvariants(BlockInfoManager.scala:84)
at org.apache.spark.storage.BlockInfo.readerCount_$eq(BlockInfoManager.scala:66)
at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:367)
at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:366)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:366)
at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:361)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:361)
at org.apache.spark.storage.BlockManager.releaseAllLocksForTask(BlockManager.scala:736)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:342)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

17/12/03 11:57:48 INFO TaskSetManager: Starting task 0.1 in stage 5.0 (TID 1801, 172.31.27.9, executor 0, partition 0, PROCESS_LOCAL, 4871 bytes)
17/12/03 11:57:48 INFO TaskSetManager: Lost task 0.1 in stage 5.0 (TID 1801) on 172.31.27.9, executor 0: java.lang.AssertionError (assertion failed) [duplicate 1]
17/12/03 11:57:48 INFO TaskSetManager: Starting task 0.2 in stage 5.0 (TID 1802, 172.31.27.9, executor 0, partition 0, PROCESS_LOCAL, 4871 bytes)
17/12/03 11:57:48 INFO TaskSetManager: Lost task 0.2 in stage 5.0 (TID 1802) on 172.31.27.9, executor 0: java.lang.AssertionError (assertion failed) [duplicate 2]
17/12/03 11:57:48 INFO TaskSetManager: Starting task 0.3 in stage 5.0 (TID 1803, 172.31.27.9, executor 0, partition 0, PROCESS_LOCAL, 4871 bytes)
17/12/03 11:57:48 INFO TaskSetManager: Lost task 0.3 in stage 5.0 (TID 1803) on     172.31.27.9, executor 0: java.lang.AssertionError (assertion failed) [duplicate 3]
17/12/03 11:57:48 ERROR TaskSetManager: Task 0 in stage 5.0 failed 4 times; aborting job
17/12/03 11:57:48 INFO TaskSchedulerImpl: Removed TaskSet 5.0, whose tasks have all completed, from pool 
17/12/03 11:57:48 INFO TaskSchedulerImpl: Cancelling stage 5
17/12/03 11:57:48 INFO DAGScheduler: ResultStage 5 (runJob at PythonRDD.scala:446) failed in 0.078 s due to Job aborted due to stage failure: Task 0 in stage 5.0 failed 4 times, most recent failure: Lost task 0.3 in stage 5.0 (TID 1803, 172.31.27.9, executor 0): java.lang.AssertionError: assertion failed
at scala.Predef$.assert(Predef.scala:156)
at org.apache.spark.storage.BlockInfo.checkInvariants(BlockInfoManager.scala:84)
at org.apache.spark.storage.BlockInfo.readerCount_$eq(BlockInfoManager.scala:66)
at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:367)
at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:366)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:366)
at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:361)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:361)
at org.apache.spark.storage.BlockManager.releaseAllLocksForTask(BlockManager.scala:736)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:342)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

似乎发生在其他人TaskSetManager: Task 0 in stage 5.0 failed 4 times; aborting job上,但我特别关注这些消息:... Task 0 in stage 5.0 failed 4 times, most recent failure: Lost task 0.3 in stage 5.0 (TID 1803, 172.31.27.9, executor 0): java.lang.AssertionError: assertion failed

我怀疑写入RDD时会出现问题,写入时可能会出错,而Spark以后无法解析。 我正在使用ALS库和Rating对象。因此,在保存到RDD之前,将RDD映射到

更方便
data.map(lambda x: (x.user, x.product, x.rating)).saveAsTextFile ("hdfs://"+master_ip+":9000/RDD/data")

并读取并解析为

data = sc.textFile("hdfs://"+master_ip+":9000/RDD/data")
data = data.map(lambda x: x[1:-1]).map(lambda x: x.split(", ")).\
              map(lambda x: Rating(int(x[0]), int(x[1]), float(x[2])))

我很好奇,因为这些错误消息并不是每次都出现,并且总是不可再现。    我使用spark 2.2.0和hapdoop 2.7。有人以前见过这个吗?

谢谢!

0 个答案:

没有答案