使用UDF for Array时如何将数据写入镶木地板

时间:2017-05-02 07:11:24

标签: apache-spark

val sc = sparkSession.sparkContext
val df = sparkSession.read.json("D:\\tempData\\dimuser.json")
import sparkSession.implicits._
val my_size = udf { subjects: Seq[Row] => subjects.size }
df.select($"username",my_size($"devices").alias("devcount")).write.parquet("D:\\tempData\\userdata.parquet")

程序是计算每个用户的设备数量。我正在尝试编写UDF,但不是官方提供的大小功能。我可以运行show()函数来预览结果。但是当我尝试将数据写入文件时,我得到以下错误,似乎类型不匹配(如下所示,错误显示的只是json文件的架构)。

Caused by: org.apache.spark.SparkException: Failed to execute user defined function(anonfun$1: (array<struct<_id:struct<$oid:string>,ct:struct<$date:string>,devicetoken:string,isInstalledApp:bigint,isinstalledapp:bigint,ismaster:bigint,lastloginduration:bigint,lastloginlocation:string,lastloginstatus:bigint,lastlogintime:struct<$date:string>,mac:string,name:string,os:string,status:string,type:string,uuid:string>>) => int)

1 个答案:

答案 0 :(得分:0)

在错误跟踪中检查另一个'由'引起',但可能是空指针异常

  

引起:org.apache.spark.SparkException:无法执行用户定义的函数......

     

...

     

引起:java.lang.NullPointerException

val my_size = udf{subjects:Seq[Row] => if(subjects == null) 0 else subjects.size}

或包含在Option

val my_size = udf{subjects:Seq[Row] => if(subjects == null) None else Some(subjects.size)}