Spark MapPartition NullPointerException错误

时间:2019-05-27 03:19:32

标签: scala apache-spark rdd

我正在YARN群集上运行一个简单的项目,以:

  • 将S3上的文本文件读入services.AddAuthentication(CookieAuthenticationDefaults.AuthenticationScheme) .AddCookie(options => { options => options.Cookie.SameSite = SameSiteMode.None; });
  • 定义架构并将该RDD转换为DF

我正在RDD上执行mapPartition,将RDD[String]转换为RDD[String]。 我的问题-我得到一个RDD[Row],但我不知道是什么问题。

stacktrace在源代码中列出了这2个行号-

  • java.Lang.NullPointerException
  • 在匿名函数中,匹配大小写与常规匹配的行

这是stacktrace的摘录-

rdd1.mapPartition

我已经尝试过-

  • 在YARN群集模式下运行-而不是在本地模式下(在我的IDE中)运行时发生错误。这使我认为执行器上未定义某些内容?我将Caused by: java.lang.NullPointerException at packageA.Herewego$$anonfun$3.apply(Herewego.scala:107) at packageA.Herewego$$anonfun$3.apply(Herewego.scala:88) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337) at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335) at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1165) at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335) at org.apache.spark.rdd.RDD.iterator(RDD.scala:286) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:121) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 函数def移到了匿名函数def中-但这没用。

这是代码块

createrow

我需要广播mapPartition中使用的任何变量或函数吗? 任何朝着正确方向的指针将不胜感激。

0 个答案:

没有答案