我们正在使用在Spark运行程序上执行的Apache Beam。我们的案例如下。以下情况会导致OutofMemory错误。
我们有1000万个左侧行(1000万行包含200列,并且有100万重复行)和1000万个右侧行(1000万行具有200列和唯一的行)并进行联接(FullOuterJoin / CoGroupByKey),其中包含200列。
但是,当我在左侧和右侧数据中都有唯一的行时(1000万行包含200列和唯一行),那么我就不会遇到任何问题。这个问题有解决方案吗?
火花提交--class Spark_93774-部署模式客户端--conf spark.driver.userClassPathFirst = true --conf spark.executor.userClassPathFirst = true --driver-memory“ 2g” --executor -内存“ 8g” --num-executors“ 6” --executor-cores“ 3” --conf spark.executor.heartbeatInterval = 50s --conf spark.network.timeout = 800s
bde3.EXAMPLE.com stdout stderr 2019/12/23 16:10:09 0 ms 13分钟52.7 MB / 10041375 2.0 GB 21.7 MB ExecutorLostFailure(退出执行程序8) 由运行中的一项任务引起)原因:容器标记为 失败:主机上的container_1576483054461_0281_01_000009: bde3.EXAMPLE.com。退出状态:143。诊断:[2019-12-23 05:53:41.582]应要求杀死了集装箱。退出代码为143+详细信息 ExecutorLostFailure(执行程序8退出,原因之一是正在运行 任务)原因:容器标记为失败: 主机上的container_1576483054461_0281_01_000009:bde3.EXAMPLE.com。出口 状态:143.诊断:[2019-12-23 05:53:41.582]容器在 请求。退出代码为143 [2019-12-23 05:53:41.582]容器已退出 具有非零退出代码143。[2019-12-23 05:53:41.583] 外部信号
2019-12-23 05:41:05 WARN BlockManager:69-块rdd_25_9无法 由于在磁盘或内存中找不到它而被删除2019-12-23 05:41:05 WARN BlockManager:69-放置块rdd_25_9失败 2019-12-23 05:41:05 INFO MemoryStore:57-选择5个块 删除(436.3 MB字节)2019-12-23 05:41:05 INFO BlockManager:57- 从内存中删除block broadcast_0_piece0 2019-12-23 05:41:05 INFO BlockManager:57-将block broadcast_0_piece0写入磁盘2019-12-23 05:41:05 INFO BlockManager:57-从以下位置删除广播的广播_0 内存2019-12-23 05:41:05 INFO BlockManager:57-书写块 broadcast_0到磁盘2019-12-23 05:41:05 INFO BlockManager:57- 从内存中删除广播或广播_1_piece0块2019-12-23 05:41:05 INFO BlockManager:57-将block broadcast_1_piece0写入磁盘2019-12-23 05:41:05 INFO BlockManager:57-从以下位置删除广播的广播_1 内存2019-12-23 05:41:05 INFO BlockManager:57-书写块 broadcast_1到磁盘2019-12-23 05:41:05 INFO BlockManager:57- 从内存中删除块rdd_25_3 2019-12-23 05:41:05 INFO MemoryStore:57-删除5个块后,可用内存为436.3 MB 2019-12-23 05:41:05 INFO WriteFiles:510-开幕作家 窗口的82bf2bda-1e8e-4f04-88f5-7441f62aadd4 org.apache.beam.sdk.transforms.windowing.GlobalWindow@3d2fc97b窗格 PaneInfo.NO_FIRING目标为空
java.lang.OutOfMemoryError:超出了GC开销限制 -XX:OnOutOfMemoryError =“杀死%p” 正在执行/ bin / sh -c“ kill 100074” ... 2019-12-23 05:53:40 WARN BlockManager:69-由于异常而将块rdd_30_9放置失败 org.apache.beam.sdk.util.UserCodeException: java.lang.OutOfMemoryError:超出了GC开销限制。 2019-12-23 05:53:40错误CoarseGrainedExecutor后端:43-收到的信号条款 2019-12-23 05:53:40 WARN BlockManager:69-块rdd_30_9无法 由于在磁盘或内存中找不到它而被删除2019-12-23 05:53:40 INFO DiskBlockManager:57-名为2019-12-23的关闭挂钩 05:53:40 INFO ShutdownHookManager:57-名为Shutdown的钩子 2019-12-23 05:53:40 INFO ShutdownHookManager:57-删除目录 / data3 / yarn / nm / usercache / user / appcache / application_1576483054461_0281 / spark-7f74f1db-c4fc-4dae-a5f1-53ce5ebeccdb 2019-12-23 05:53:40 INFO ShutdownHookManager:57-删除目录 / data2 / yarn / nm / usercache / user / appcache / application_1576483054461_0281 / spark-450bfdae-f82a-4a07-8e8c-7e175c10c182 2019-12-23 05:53:40错误执行器:94-阶段任务9.0中的异常 1.0(TID 33)org.apache.beam.sdk.util.UserCodeException:java.lang.OutOfMemoryError:超出了GC开销限制 org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:34) 在 org.apache.beam.sdk.transforms.join.CoGroupByKey $ ConstructCoGbkResultFn $ DoFnInvoker.invokeProcessElement(未知 来源) org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:214) 在 org.apache.beam.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:179) 在 org.apache.beam.runners.spark.translation.DoFnRunnerWithMetrics.processElement(DoFnRunnerWithMetrics.java:65) 在 org.apache.beam.runners.spark.translation.SparkProcessContext $ ProcCtxtIterator.computeNext(SparkProcessContext.java:137) 在 org.apache.beam.vendor.guava.v20_0.com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145) 在 org.apache.beam.vendor.guava.v20_0.com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:140) 在 scala.collection.convert.Wrappers $ JIteratorWrapper.hasNext(Wrappers.scala:42) 在scala.collection.Iterator $$ anon $ 13.hasNext(Iterator.scala:462)在 scala.collection.Iterator $$ anon $ 11.hasNext(Iterator.scala:409)在 scala.collection.convert.Wrappers $ IteratorWrapper.hasNext(Wrappers.scala:30) 在 org.apache.beam.runners.spark.translation.SparkProcessContext $ ProcCtxtIterator.computeNext(SparkProcessContext.java:135) 在 org.apache.beam.vendor.guava.v20_0.com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145) 在 org.apache.beam.vendor.guava.v20_0.com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:140) 在 scala.collection.convert.Wrappers $ JIteratorWrapper.hasNext(Wrappers.scala:42) 在scala.collection.Iterator $$ anon $ 13.hasNext(Iterator.scala:462)在 scala.collection.Iterator $$ anon $ 11.hasNext(Iterator.scala:409)在 scala.collection.convert.Wrappers $ IteratorWrapper.hasNext(Wrappers.scala:30) 在 org.apache.beam.runners.spark.translation.SparkProcessContext $ ProcCtxtIterator.computeNext(SparkProcessContext.java:135) 在 org.apache.beam.vendor.guava.v20_0.com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145) 在 org.apache.beam.vendor.guava.v20_0.com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:140) 在 scala.collection.convert.Wrappers $ JIteratorWrapper.hasNext(Wrappers.scala:42) 在 org.apache.spark.storage.memory.PartiallyUnrolledIterator.hasNext(MemoryStore.scala:753) 在 org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) 在scala.collection.Iterator $$ anon $ 13.hasNext(Iterator.scala:462)在 scala.collection.Iterator $$ anon $ 11.hasNext(Iterator.scala:409)在 scala.collection.convert.Wrappers $ IteratorWrapper.hasNext(Wrappers.scala:30) 在 org.apache.beam.runners.spark.translation.SparkProcessContext $ ProcCtxtIterator.computeNext(SparkProcessContext.java:135) 在 org.apache.beam.vendor.guava.v20_0.com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145) 在 org.apache.beam.vendor.guava.v20_0.com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:140) 在 scala.collection.convert.Wrappers $ JIteratorWrapper.hasNext(Wrappers.scala:42) 在 org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221) 在 org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:299) 在 org.apache.spark.storage.BlockManager $$ anonfun $ doPutIterator $ 1.apply(BlockManager.scala:1176) 在 org.apache.spark.storage.BlockManager $$ anonfun $ doPutIterator $ 1.apply(BlockManager.scala:1167) 在 org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1102) 在 org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1167) 在 org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:893) 在org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)处 org.apache.spark.rdd.RDD.iterator(RDD.scala:286)在 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) 在org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) 在org.apache.spark.rdd.RDD.iterator(RDD.scala:288)处 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) 在org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) 在org.apache.spark.rdd.RDD.iterator(RDD.scala:288)处 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) 在org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) 在org.apache.spark.rdd.RDD.iterator(RDD.scala:288)处 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) 在 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) 在org.apache.spark.scheduler.Task.run(Task.scala:121)在 org.apache.spark.executor.Executor $ TaskRunner $$ anonfun $ 11.apply(Executor.scala:407) 在org.apache.spark.util.Utils $ .tryWithSafeFinally(Utils.scala:1363) 在 org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:413) 在 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 在 java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:624) 在java.lang.Thread.run(Thread.java:748)造成原因: java.lang.OutOfMemoryError:超出了GC开销限制 org.apache.spark.util.collection.PartitionedPairBuffer $$ anon $ 1.next(PartitionedPairBuffer.scala:93) 在 org.apache.spark.util.collection.PartitionedPairBuffer $$ anon $ 1.next(PartitionedPairBuffer.scala:84) 在 org.apache.spark.util.collection.ExternalSorter $ SpillableIterator.readNext(ExternalSorter.scala:816) 在 org.apache.spark.util.collection.ExternalSorter $ SpillableIterator.next(ExternalSorter.scala:826) 在 org.apache.spark.util.collection.ExternalSorter $ SpillableIterator.next(ExternalSorter.scala:769) 在scala.collection.Iterator $$ anon $ 1.next(Iterator.scala:1008)在 scala.collection.Iterator $$ anon $ 1.head(Iterator.scala:995)在 org.apache.spark.util.collection.ExternalSorter $ IteratorForPartition.hasNext(ExternalSorter.scala:758) 在scala.collection.Iterator $$ anon $ 1.hasNext(Iterator.scala:1002)在 org.apache.spark.util.collection.ExternalSorter $$ anon $ 2.next(ExternalSorter.scala:384) 在 org.apache.spark.util.collection.ExternalSorter $$ anon $ 2.next(ExternalSorter.scala:375) 在scala.collection.Iterator $$ anon $ 12.next(Iterator.scala:445)在 org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:28) 在 org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40) 在 scala.collection.convert.Wrappers $ IteratorWrapper.next(Wrappers.scala:31) 在 org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Iterators $ PeekingImpl.peek(Iterators.java:1128) 在 org.apache.beam.runners.spark.translation.GroupNonMergingWindowsFunctions $ GroupByKeyIterator $ ValueIterator $ 1.computeNext(GroupNonMergingWindowsFunctions.java:171) 在 org.apache.beam.vendor.guava.v20_0.com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145) 在 org.apache.beam.vendor.guava.v20_0.com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:140) 在 org.apache.beam.sdk.transforms.join.CoGbkResult。(CoGbkResult.java:85) 在 org.apache.beam.sdk.transforms.join.CoGbkResult。(CoGbkResult.java:69) 在 org.apache.beam.sdk.transforms.join.CoGroupByKey $ ConstructCoGbkResultFn.processElement(CoGroupByKey.java:192) 在 org.apache.beam.sdk.transforms.join.CoGroupByKey $ ConstructCoGbkResultFn $ DoFnInvoker.invokeProcessElement(未知 来源) org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:214) 在 org.apache.beam.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:179) 在 org.apache.beam.runners.spark.translation.DoFnRunnerWithMetrics.processElement(DoFnRunnerWithMetrics.java:65) 在 org.apache.beam.runners.spark.translation.SparkProcessContext $ ProcCtxtIterator.computeNext(SparkProcessContext.java:137) 在 org.apache.beam.vendor.guava.v20_0.com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145) 在 org.apache.beam.vendor.guava.v20_0.com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:140) 在 scala.collection.convert.Wrappers $ JIteratorWrapper.hasNext(Wrappers.scala:42) 在scala.collection.Iterator $$ anon $ 13.hasNext(Iterator.scala:462)在 scala.collection.Iterator $$ anon $ 11.hasNext(Iterator.scala:409) 2019-12-23 05:53:40 INFO ShutdownHookManager:57-删除目录 / data1 / yarn / nm / usercache / user / appcache / application_1576483054461_0281 / spark-371786e2-1ff0-4ae4-84b1-b8359b177e69 2019-12-23 05:53:40 INFO执行器:57-不向驱动器报告错误 在JVM关闭期间。