我创建了一个Dataframe并编写了一个UDF,用于小写到大写的转换。
但是当我尝试拨打nameUdf
时,我得到了NullPointerException
。
case class Employee(id:Int, name:String, salary:Double)
val empList=List("111,aaa,20000.0", "222,bbb,300.00", "333,ccc,4000.00")
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
val empDF=sc.parallelize(empList).map{line=>
val data=line.split(",")
Employee(data(0).toInt,data(1),data(2).toDouble)
}.toDF()
empDF.withColumn("NAME_UP",convert($"name")).show()
val nameUdf=udf{(name:String)=>name.toUpperCase}
val convert=udf[String,String](name=>name.toUpperCase)
例外如下所示:
18/03/21 14:08:10 INFO BlockManagerMaster: Registered BlockManager
Exception in thread "main" java.lang.NullPointerException
at com.org.test.UDFTest$.delayedEndpoint$com$org$test$UDFTest$1(UDFTest.scala:22)
at com.org.test.UDFTest$delayedInit$body.apply(UDFTest.scala:8)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at com.org.test.UDFTest$.main(UDFTest.scala:8)
at com.org.test.UDFTest.main(UDFTest.scala)
18/03/21 14:08:12 INFO SparkContext: Invoking stop() from shutdown hook
如何使用现有数据框调用UDF函数?
答案 0 :(得分:4)
确保在使用udf
之前定义case class
并在对象范围之外定义object QuickTest extends App {
val spark = SparkSession.builder().appName("test").master("local[*]").getOrCreate()
val empList=List("111,aaa,20000.0","222,bbb,300.00","333,ccc,4000.00")
import spark.implicits._
val empDF=spark.sparkContext.parallelize(empList).map{line=>
val data=line.split(",")
Employee(data(0).toInt,data(1),data(2).toDouble)
}.toDF()
val nameUdf=udf{(name:String)=>name.toUpperCase}
val convert=udf[String,String](name=>name.toUpperCase)
empDF.withColumn("NAME_UP",convert($"name")).show()
}//end of object QuickTest
case class Employee(id:Int,name:String,salary:Double)
:
with tf.Graph().as_default():
ph = tf.constant([1., 2., 3.])
v = tf.get_variable('v', (3,))
loss = tf.square(ph-v)
optimizer = tf.train.GradientDescentOptimizer(0.1)
trainer = optimizer.minimize(loss)
gradients = optimizer.compute_gradients(loss)[0][0]
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
grad_mat = sess.run(gradients)
v_0 = sess.run(v)
sess.run(trainer)
v_1 = sess.run(v)
print(grad_mat)
print(v_0)
print(v_0 - 0.1*grad_mat)
print(v_1)