我需要JSON文本中的行数,其中A.adList.optionalField = null
JSON如下:
{
"A":{
"adList":[
{
"a":"qwfqw"
},
{
"b":"fqw",
"c":23423,
"optionalField":null
}
]
}
}
这有效:
df.select(df("id")).where(array_contains(df("A.adList.optionalField"),4)).registerTempTable("hb")
select count(*) from hb
但是,对于NULL,我不能做同样的事情
df.select(df("id")).where(array_contains(df("A.adList.optionalField"),"null")).registerTempTable("hb")
有什么主意我可以轻松做到这一点吗? 问题Check if arraytype column contains null在这里讨论了Seq [Int]中可能存在的NULL,如上所述,我正在处理Seq [Struct]中的结构中可能存在的NULL字段。
答案 0 :(得分:-1)
array_contains()在第二个参数中不允许null
。要检查数组是否为空,可以通过设置ascending = true来进行sort_array()。然后,如果第一个元素为null,则可以再次测试它为nullull(sort_array(col(a),true)(0))
检查一下:
scala> val df = spark.read.format("json").option("multiLine","true").load("/tmp/stack/tanvi.json").toDF("id")
df: org.apache.spark.sql.DataFrame = [id: struct<adList: array<struct<a:string,b:string,c:bigint,optionalField:string>>>]
scala> df.printSchema
root
|-- id: struct (nullable = true)
| |-- adList: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- a: string (nullable = true)
| | | |-- b: string (nullable = true)
| | | |-- c: long (nullable = true)
| | | |-- optionalField: string (nullable = true)
scala> df.select(sort_array(df("id.adList.optionalField"),true)(0),size(df("id.adList.optionalField"))).show(false)
+---------------------------------------------------------------+------------------------------------------------+
|sort_array(id.adList.optionalField AS `optionalField`, true)[0]|size(id.adList.optionalField AS `optionalField`)|
+---------------------------------------------------------------+------------------------------------------------+
|null |2 |
+---------------------------------------------------------------+------------------------------------------------+
scala> df.select(sort_array(df("id.adList.optionalField"),true)(1),size(df("id.adList.optionalField"))).show(false)
+---------------------------------------------------------------+------------------------------------------------+
|sort_array(id.adList.optionalField AS `optionalField`, true)[1]|size(id.adList.optionalField AS `optionalField`)|
+---------------------------------------------------------------+------------------------------------------------+
|null |2 |
+---------------------------------------------------------------+------------------------------------------------+
scala> df.select(isnull(sort_array(df("id.adList.optionalField"),true)(0)),size(df("id.adList.optionalField"))).show(false)
+-------------------------------------------------------------------------+------------------------------------------------+
|(sort_array(id.adList.optionalField AS `optionalField`, true)[0] IS NULL)|size(id.adList.optionalField AS `optionalField`)|
+-------------------------------------------------------------------------+------------------------------------------------+
|true |2 |
+-------------------------------------------------------------------------+------------------------------------------------+
scala>