我制作广播:
val broadcastIndex = sc.broadcast(index.collectAsMap())
val Ahit = read.map( (x: String) => {
val r = broadcastIndex.value.get(3906900).get
r
}).reduce(_ ++ _)
数字“3906900”是索引中一行记录的id,错误如下:
14/12/08 15:30:02错误的ActorSystemImpl:来自的未捕获的致命错误 线程[sparkDriver-akka.actor.default-dispatcher-31]关闭 ActorSystem [sparkDriver] java.lang.OutOfMemoryError:Java堆空间 在 com.google.protobuf_spark.ByteString.toByteArray(ByteString.java:213) 在 akka.remote.serialization.MessageContainerSerializer.fromBinary(MessageContainerSerializer.scala:59) 在 akka.serialization.Serialization $$ anonfun $ $反序列化1.适用(Serialization.scala:104) 在scala.util.Try $ .apply(Try.scala:161)at akka.serialization.Serialization.deserialize(Serialization.scala:98) 在 akka.remote.MessageSerializer $ .deserialize(MessageSerializer.scala:23) 在 akka.remote.DefaultMessageDispatcher.payload $ lzycompute $ 1(Endpoint.scala:55) at akka.remote.DefaultMessageDispatcher.payload $ 1(Endpoint.scala:55) at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:73) 在 akka.remote.EndpointReader $$ anonfun $获得$ 2.applyOrElse(Endpoint.scala:764) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)at at at akka.actor.ActorCell.invoke(ActorCell.scala:456)at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)at akka.dispatch.Mailbox.run(Mailbox.scala:219)at akka.dispatch.ForkJoinExecutorConfigurator $ AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) 在 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 在 scala.concurrent.forkjoin.ForkJoinPool $ WorkQueue.runTask(ForkJoinPool.java:1339) 在 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) 在 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 14/12/08 15:30:02 INFO DAGScheduler:无法运行collectAsMap App.scala:183线程“main”中的异常 org.apache.spark.SparkException:因为SparkContext而取消了作业 被关闭了 org.apache.spark.scheduler.DAGScheduler $$ anonfun $ cleanUpAfterSchedulerStop $ 1.适用(DAGScheduler.scala:694) 在 org.apache.spark.scheduler.DAGScheduler $$ anonfun $ cleanUpAfterSchedulerStop $ 1.适用(DAGScheduler.scala:693) 在scala.collection.mutable.HashSet.foreach(HashSet.scala:79)at org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:693) 在 org.apache.spark.scheduler.DAGSchedulerEventProcessActor.postStop(DAGScheduler.scala:1399) 在 akka.actor.dungeon.FaultHandling $ class.akka $演员$地牢$ FaultHandling $$ finishTerminate(FaultHandling.scala:201) 在 akka.actor.dungeon.FaultHandling $ class.terminate(FaultHandling.scala:163) at akka.actor.ActorCell.terminate(ActorCell.scala:338)at akka.actor.ActorCell.invokeAll $ 1(ActorCell.scala:431)at at akka.actor.ActorCell.systemInvoke(ActorCell.scala:447)at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:262)at akka.dispatch.Mailbox.run(Mailbox.scala:218)at akka.dispatch.ForkJoinExecutorConfigurator $ AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) 在 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 在 scala.concurrent.forkjoin.ForkJoinPool $ WorkQueue.runTask(ForkJoinPool.java:1339) 在 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) 在 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
有人能告诉我如何解决这个问题吗?谢谢