Spark:OutOfMemoryError:Java堆空间

时间:2015-06-24 07:46:56

标签: java apache-spark

我正在使用Cassandra的spark 1.2(每个工作2个工作和8GO),我有一个OutOfMemoryError例外:Java堆空间错误。当我使用大量数据(15M行)执行算法时会出现此错误。这是错误:

ver-akka.remote.default-remote-dispatcher-6] shutting down ActorSystem [sparkDriver]
java.lang.OutOfMemoryError: Java heap space
at org.spark_project.protobuf.ByteString.copyFrom(ByteString.java:192)
at org.spark_project.protobuf.CodedInputStream.readBytes(CodedInputStream.java:324)
at akka.remote.ContainerFormats$SelectionEnvelope.<init>(ContainerFormats.java:223)
at akka.remote.ContainerFormats$SelectionEnvelope.<init>(ContainerFormats.java:173)
at akka.remote.ContainerFormats$SelectionEnvelope$1.parsePartialFrom(ContainerFormats.java:282)
at akka.remote.ContainerFormats$SelectionEnvelope$1.parsePartialFrom(ContainerFormats.java:277)
at org.spark_project.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:141)
at org.spark_project.protobuf.AbstractParser.parseFrom(AbstractParser.java:176)
at org.spark_project.protobuf.AbstractParser.parseFrom(AbstractParser.java:188)
at org.spark_project.protobuf.AbstractParser.parseFrom(AbstractParser.java:193)
at org.spark_project.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
at akka.remote.ContainerFormats$SelectionEnvelope.parseFrom(ContainerFormats.java:494)
at akka.remote.serialization.MessageContainerSerializer.fromBinary(MessageContainerSerializer.scala:62)
at akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
at scala.util.Try$.apply(Try.scala:161)
at akka.serialization.Serialization.deserialize(Serialization.scala:98)
at akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:23)
at akka.remote.DefaultMessageDispatcher.payload$lzycompute$1(Endpoint.scala:58)
at akka.remote.DefaultMessageDispatcher.payload$1(Endpoint.scala:58)
at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:76)
at akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
15/06/23 17:46:26 INFO DAGScheduler: Job 1 failed: reduce at CustomerJourney.scala:135, took 60.447249 s
Exception in thread "main" 15/06/23 17:46:26 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
org.apache.spark.SparkException: Job cancelled because SparkContext was shut down
at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:702)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:701)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
at org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:701)
at org.apache.spark.scheduler.DAGSchedulerEventProcessActor.postStop(DAGScheduler.scala:1428)
at akka.actor.Actor$class.aroundPostStop(Actor.scala:475)
at org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundPostStop(DAGScheduler.scala:1375)
at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)
at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)
at akka.actor.ActorCell.terminate(ActorCell.scala:369)
at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)
at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
15/06/23 17:46:26 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.

我尝试更改spark.kryoserializer.buffer.mbspark.storage.memoryFraction的配置,但我仍然遇到同样的错误。

我还使用Eclipse MAT创建堆转储,以确定导致问题的对象。这是我得到的结果 enter image description here

课程终结者吃了这么多记忆。在谷歌的一些研究后,我发现了这个问题:is memory leak? why java.lang.ref.Finalizer eat so much memory

这是问题的答案,如果找到的话:

Some classes implement the Object.finalize() method. Objects which override this method need to called by a background thread call finalizer, and they can't be cleaned up until this happens. If these tasks are short and you don't discard many of these it all works well. However if you are creating lots of these objects and/or their finalizers take a long time, the queue of objects to be finalized builds up. It is possible for this queue to use up all the memory.
The solution is

-don't use finalize()d objects if you can (if you are writing the class for the object)
-make finalize very short (if you have to use it)
-don't discard such objects every time (try to re-use them)

但是对于我的情况,我还没有停止任何使用这门课程的班级。

0 个答案:

没有答案