我尝试加入2个hive表,omega和card,如下所示:
表omega:y
表卡:
+------+--------+-------+-----+-----+
|pid |enventid|card_id|count|name |
+------+--------+-------+-----+-----+
|111111|"sk" |"pro" |2 |"aaa"|
|222222|"sk" |"pro" |2 |"ddd"|
+------+--------+-------+-----+-----+
然后我定义了一个udf:
+-------+---------+
|card_id|card_desc|
+-------+---------+
|"pro" |"1|2|3" |
+-------+---------+
现在,我尝试使用定义的udf:
连接2个表val getListUdf = udf((raw: String) => raw.split("|"))
但是,我收到了这些错误:
omega.join(card, Seq("card_id"), "left_outer").withColumn("card_desc", getListUdf(col("card_desc")))
我该如何解决?谁能帮我?感谢
答案 0 :(得分:1)
显然,您将null引入UDF,这会导致nullpointer(在null上调用split
)。试试:
.withColumn("card_desc",
when(
col("card_desc").isNotNull,
getListUdf(col("card_desc"))
)
)
答案 1 :(得分:1)
由于您使用left-outer
加入加入两个数据框, null
列的card_desc
列中rows
值card_id
在omega数据框中,卡数据框中没有匹配的udf
。当split
函数尝试拆分空值时,会得到nullPointerException 。
我建议您使用null
内置函数来处理omega.join(card, Seq("card_id"), "left_outer")
.withColumn("card_desc", split(col("card_desc"), "\\|"))
值
split
udf
功能与您使用val getListUdf = udf((raw: String) => raw match{
case null => Array.empty[String]
case _ => raw.split("\\|")
})
功能完全相同。
或者您可以将您的udf功能更改为
ServiceAccountCredential credential;
using (Stream stream = new FileStream(serviceAccountCredentialFilePath, FileMode.Open, FileAccess.Read))
{
credential = GoogleCredential.FromStream(stream).CreateScoped(scopes).UnderlyingCredential as ServiceAccountCredential;
}
// Create the Calendar service.
return new GmailService(new BaseClientService.Initializer()
{
HttpClientInitializer = credential,
ApplicationName = "isatisservice",
});