Spark优化器提供错误的输出

时间:2017-09-26 08:51:21

标签: scala apache-spark

CODE HERE...   
 def getMeasure = udf((next: String) =>{
          if(next.equals("[ArStart]")){"Reboot"}
          else if(next.equals("[MONITOR]")){"Reload"}
          else{"Others"}
        })
    val systemWindow = Window.partitionBy($"company", $"location", $"systemname").orderBy($"timestamp",$"rn")

val tmp_dev_msg_1 = ds_state_2.where($"replaced_device_message".isNotNull)
val tmp_dev_msg_2 = tmp_dev_msg_1
        .withColumn("measure", getMeasure(lead($"replaced_device_message", 1).over(systemWindow)))
val test = tmp_dev_msg_2
        .select($"timestamp", $"rn", $"measure")
val ds_measure_1 = ds_state_2.join(test, Seq("timestamp" , "rn"), "leftouter")

MORE CODE HERE

此代码在没有join语句的情况下工作正常。打印测试数据集仅显示replacement_device_message不为null的条目。

当我使用join语句执行时,我得到函数getMeasure的nullpointer异常。但是这不应该是可能的,因为我正在查看tmp_dev_msg1($"replaced_device_message"),它不能包含任何空值。

编辑:错误的行为显示时间戳不唯一的条目。但根据我的理解,这应该不是问题,因为我也有一个row_number - >这是独一无二的

  

错误消息:org.apache.spark.SparkException:无法执行用户   定义函数(anonfun $ getMeasure $ 1 $ 1:(string)=> string)at   org.apache.spark.sql.catalyst.expressions.GeneratedClass $ GeneratedIterator.processNext(未知   来源)at   org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)     在   org.apache.spark.sql.execution.WholeStageCodegenExec $$ anonfun $ 8 $$不久$ 1.hasNext(WholeStageCodegenExec.scala:395)     在scala.collection.Iterator $$ anon $ 11.hasNext(Iterator.scala:408)at   org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:148)     在   org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)     在   org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)     在org.apache.spark.scheduler.Task.run(Task.scala:108)at   org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:335)     在   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)     在   java.util.concurrent.ThreadPoolExecutor中的$ Worker.run(ThreadPoolExecutor.java:617)     在java.lang.Thread.run(Thread.java:748)引起:   java.lang.NullPointerException at   稳定性$$ anonfun $ getMeasure $ 1 $ 1.apply(Stability.scala:165)at   稳定性$$ anonfun $ getMeasure $ 1 $ 1.适用(Stability.scala:164)

任何想法?

0 个答案:

没有答案