使用Spark UDF时获得空指针异常

时间:2018-03-21 08:49:59

标签: scala apache-spark apache-spark-sql

我创建了一个Dataframe并编写了一个UDF,用于小写到大写的转换。

但是当我尝试拨打nameUdf时,我得到了NullPointerException

case class Employee(id:Int, name:String, salary:Double)           
val empList=List("111,aaa,20000.0", "222,bbb,300.00", "333,ccc,4000.00")

val sqlContext = new SQLContext(sc)

import sqlContext.implicits._

val empDF=sc.parallelize(empList).map{line=>
    val data=line.split(",")
    Employee(data(0).toInt,data(1),data(2).toDouble)
}.toDF()

empDF.withColumn("NAME_UP",convert($"name")).show()

val nameUdf=udf{(name:String)=>name.toUpperCase}
val convert=udf[String,String](name=>name.toUpperCase)

例外如下所示:

18/03/21 14:08:10 INFO BlockManagerMaster: Registered BlockManager
Exception in thread "main" java.lang.NullPointerException
    at com.org.test.UDFTest$.delayedEndpoint$com$org$test$UDFTest$1(UDFTest.scala:22)
    at com.org.test.UDFTest$delayedInit$body.apply(UDFTest.scala:8)
    at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
    at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
    at scala.App$$anonfun$main$1.apply(App.scala:76)
    at scala.App$$anonfun$main$1.apply(App.scala:76)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
    at scala.App$class.main(App.scala:76)
    at com.org.test.UDFTest$.main(UDFTest.scala:8)
    at com.org.test.UDFTest.main(UDFTest.scala)
18/03/21 14:08:12 INFO SparkContext: Invoking stop() from shutdown hook

如何使用现有数据框调用UDF函数?

1 个答案:

答案 0 :(得分:4)

确保在使用udf之前定义case class并在对象范围之外定义object QuickTest extends App { val spark = SparkSession.builder().appName("test").master("local[*]").getOrCreate() val empList=List("111,aaa,20000.0","222,bbb,300.00","333,ccc,4000.00") import spark.implicits._ val empDF=spark.sparkContext.parallelize(empList).map{line=> val data=line.split(",") Employee(data(0).toInt,data(1),data(2).toDouble) }.toDF() val nameUdf=udf{(name:String)=>name.toUpperCase} val convert=udf[String,String](name=>name.toUpperCase) empDF.withColumn("NAME_UP",convert($"name")).show() }//end of object QuickTest case class Employee(id:Int,name:String,salary:Double)

with tf.Graph().as_default():
  ph = tf.constant([1., 2., 3.])
  v = tf.get_variable('v', (3,))
  loss = tf.square(ph-v)
  optimizer = tf.train.GradientDescentOptimizer(0.1)
  trainer = optimizer.minimize(loss)
  gradients = optimizer.compute_gradients(loss)[0][0]

  with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    grad_mat = sess.run(gradients)
    v_0 = sess.run(v)
    sess.run(trainer)
    v_1 = sess.run(v)

    print(grad_mat)
    print(v_0)
    print(v_0 - 0.1*grad_mat)
    print(v_1)