spark sql query,java.lang.NullPointerException

时间:2018-04-01 11:22:37

标签: scala apache-spark user-defined-functions

我尝试加入2个hive表,omega和card,如下所示:

表omega:

y

表卡:

+------+--------+-------+-----+-----+
|pid   |enventid|card_id|count|name |
+------+--------+-------+-----+-----+
|111111|"sk"    |"pro"  |2    |"aaa"|
|222222|"sk"    |"pro"  |2    |"ddd"|
+------+--------+-------+-----+-----+

然后我定义了一个udf:

+-------+---------+
|card_id|card_desc|
+-------+---------+
|"pro"  |"1|2|3"  | 
+-------+---------+

现在,我尝试使用定义的udf:

连接2个表
val getListUdf = udf((raw: String) => raw.split("|"))

但是,我收到了这些错误:

omega.join(card, Seq("card_id"), "left_outer").withColumn("card_desc", getListUdf(col("card_desc")))

我该如何解决?谁能帮我?感谢

2 个答案:

答案 0 :(得分:1)

显然,您将null引入UDF,这会导致nullpointer(在null上调用split)。试试:

.withColumn("card_desc", 
            when(
              col("card_desc").isNotNull,
              getListUdf(col("card_desc"))
            )
        )

答案 1 :(得分:1)

由于您使用left-outer加入加入两个数据框, null列的card_desc列中rowscard_id在omega数据框中,卡数据框中没有匹配的udf 。当split函数尝试拆分空值时,会得到nullPointerException

我建议您使用null 内置函数来处理omega.join(card, Seq("card_id"), "left_outer") .withColumn("card_desc", split(col("card_desc"), "\\|"))

split

udf功能与您使用val getListUdf = udf((raw: String) => raw match{ case null => Array.empty[String] case _ => raw.split("\\|") }) 功能完全相同。

或者您可以将您的udf功能更改为

            ServiceAccountCredential credential;

            using (Stream stream = new FileStream(serviceAccountCredentialFilePath, FileMode.Open, FileAccess.Read))
            {
                credential = GoogleCredential.FromStream(stream).CreateScoped(scopes).UnderlyingCredential as ServiceAccountCredential;
            }

            // Create the  Calendar service.
            return new GmailService(new BaseClientService.Initializer()
            {
                HttpClientInitializer = credential,
                ApplicationName = "isatisservice",
            });