我在Apache Spark(Apache Zeppelin,Spark 2.0)中有一个非常简单的程序。是UDF,没有意义(训练目标),但是我遇到了错误,我想知道哪里出了问题:
val customer = sql("SELECT * FROM foodmart.customer")
val myUdf = udf((a: String, b: String) => {
val b = sql("SELECT * FROM foodmart.product")
"a"
})
customer.withColumn("Contact1", myUdf($"address1", $"address2")).show()
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 6.0 failed 4 times, most recent failure: Lost task 0.3 in stage 6.0 (TID 12, sandbox.hortonworks.com, executor 3): org.apache.spark.SparkException: Failed to execute user defined function($anonfun$1: (string, string) => string)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at $$$$bec6d1991b88c272b3efac29d720f546$$$$anonfun$1.apply(<console>:68)
at $$$$bec6d1991b88c272b3efac29d720f546$$$$anonfun$1.apply(<console>:67)
是否可以在UDF函数中使用Spark select语句?你能告诉我我犯了什么错误吗?