我有如下DF:
| value |offset (these 2 are columns)
|{"Name":"myname","valid":"true"} | Guru
|{"Name":"myname1","valid","false"}| Guru
我想根据值列的true或false来从中减去2 DF:
| value |offset
|{"Name":"myname","valid":"true"} | Guru
| value |offset
|{"Name":"myname1","valid","false"}| Guru
答案 0 :(得分:0)
get_json_object()
用于处理包含JSON字符串的字段。参见https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$@get_json_object(e:org.apache.spark.sql.Column,path:String):org.apache.spark.sql.Column
scala> val in = """value offset partition sourceSystem sourceName datePartition
| {"Name":"myname","valid":"true"} Guru 1 sda sajka ajsa
| {"Name":"myname1","valid":"false"} Guru 1 sda sajka ajsa"""
in: String =
value offset partition sourceSystem sourceName datePartition
{"Name":"myname","valid":"true"} Guru 1 sda sajka ajsa
{"Name":"myname1","valid":"false"} Guru 1 sda sajka ajsa
scala> val df = spark.read.option("header", true).option("sep", "\t").csv(in.split("\n").toSeq.toDS)
df: org.apache.spark.sql.DataFrame = [value: string, offset: string ... 4 more fields]
scala> df.where(get_json_object('value, "$.valid") === "true").show
+--------------------+------+---------+------------+----------+-------------+
| value|offset|partition|sourceSystem|sourceName|datePartition|
+--------------------+------+---------+------------+----------+-------------+
|{"Name":"myname",...| Guru| 1| sda| sajka| ajsa|
+--------------------+------+---------+------------+----------+-------------+