Question

我尝试加入2个hive表，omega和card，如下所示：

表omega：

表卡：

+------+--------+-------+-----+-----+
|pid   |enventid|card_id|count|name |
+------+--------+-------+-----+-----+
|111111|"sk"    |"pro"  |2    |"aaa"|
|222222|"sk"    |"pro"  |2    |"ddd"|
+------+--------+-------+-----+-----+

然后我定义了一个udf：

+-------+---------+
|card_id|card_desc|
+-------+---------+
|"pro"  |"1|2|3"  | 
+-------+---------+

现在，我尝试使用定义的udf：

连接2个表

val getListUdf = udf((raw: String) => raw.split("|"))

但是，我收到了这些错误：

omega.join(card, Seq("card_id"), "left_outer").withColumn("card_desc", getListUdf(col("card_desc")))

我该如何解决？谁能帮我？感谢

Answer 1

显然，您将null引入UDF，这会导致nullpointer（在null上调用split）。试试：

.withColumn("card_desc", 
            when(
              col("card_desc").isNotNull,
              getListUdf(col("card_desc"))
            )
        )

Answer 2

由于您使用left-outer加入加入两个数据框， null列的card_desc列中rows值card_id在omega数据框中，卡数据框中没有匹配的udf 。当split函数尝试拆分空值时，会得到nullPointerException 。

我建议您使用null 内置函数来处理omega.join(card, Seq("card_id"), "left_outer") .withColumn("card_desc", split(col("card_desc"), "\\|"))值

split

udf功能与您使用val getListUdf = udf((raw: String) => raw match{ case null => Array.empty[String] case _ => raw.split("\\|") })功能完全相同。

或者您可以将您的udf功能更改为

            ServiceAccountCredential credential;

            using (Stream stream = new FileStream(serviceAccountCredentialFilePath, FileMode.Open, FileAccess.Read))
            {
                credential = GoogleCredential.FromStream(stream).CreateScoped(scopes).UnderlyingCredential as ServiceAccountCredential;
            }

            // Create the  Calendar service.
            return new GmailService(new BaseClientService.Initializer()
            {
                HttpClientInitializer = credential,
                ApplicationName = "isatisservice",
            });

spark sql query，java.lang.NullPointerException

2 个答案: