val sc = sparkSession.sparkContext
val df = sparkSession.read.json("D:\\tempData\\dimuser.json")
import sparkSession.implicits._
val my_size = udf { subjects: Seq[Row] => subjects.size }
df.select($"username",my_size($"devices").alias("devcount")).write.parquet("D:\\tempData\\userdata.parquet")
程序是计算每个用户的设备数量。我正在尝试编写UDF,但不是官方提供的大小功能。我可以运行show()函数来预览结果。但是当我尝试将数据写入文件时,我得到以下错误,似乎类型不匹配(如下所示,错误显示的只是json文件的架构)。
Caused by: org.apache.spark.SparkException: Failed to execute user defined function(anonfun$1: (array<struct<_id:struct<$oid:string>,ct:struct<$date:string>,devicetoken:string,isInstalledApp:bigint,isinstalledapp:bigint,ismaster:bigint,lastloginduration:bigint,lastloginlocation:string,lastloginstatus:bigint,lastlogintime:struct<$date:string>,mac:string,name:string,os:string,status:string,type:string,uuid:string>>) => int)
答案 0 :(得分:0)
在错误跟踪中检查另一个'由'引起',但可能是空指针异常
引起:org.apache.spark.SparkException:无法执行用户定义的函数......
...
引起:java.lang.NullPointerException
val my_size = udf{subjects:Seq[Row] => if(subjects == null) 0 else subjects.size}
或包含在Option
中 val my_size = udf{subjects:Seq[Row] => if(subjects == null) None else Some(subjects.size)}