Question

我在Apache Spark（Apache Zeppelin，Spark 2.0）中有一个非常简单的程序。是UDF，没有意义（训练目标），但是我遇到了错误，我想知道哪里出了问题：

val customer = sql("SELECT * FROM foodmart.customer")
val myUdf = udf((a: String, b: String) => {
 val b = sql("SELECT * FROM foodmart.product")
"a"
})
customer.withColumn("Contact1", myUdf($"address1", $"address2")).show()

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 6.0 failed 4 times, most recent failure: Lost task 0.3 in stage 6.0 (TID 12, sandbox.hortonworks.com, executor 3): org.apache.spark.SparkException: Failed to execute user defined function($anonfun$1: (string, string) => string)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:99)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
    at $$$$bec6d1991b88c272b3efac29d720f546$$$$anonfun$1.apply(<console>:68)
    at $$$$bec6d1991b88c272b3efac29d720f546$$$$anonfun$1.apply(<console>:67)

是否可以在UDF函数中使用Spark select语句？你能告诉我我犯了什么错误吗？

Spark SQL-UDF错误

0 个答案: