我有一个包含键/值对象列表的列:
+----+--------------------------------------------------------------------------------------------+
|ID | Settings |
+----+--------------------------------------------------------------------------------------------+
|1 | [{"key":"key1","value":"val1"}, {"key":"key2","value":"val2"}, {"key":"key3","value":"val3"}] |
+----+--------------------------------------------------------------------------------------------+
是否可以将此对象列表拆分为自己的行? 就这样:
+----+------+-------+-------+
|ID | key1 | key2 | key3 |
+----+------+-------+-------+
|1 | val1 | val2 | val3 |
+----+------+-------+-------+
我试过爆炸,然后放入一个结构:
case class Setting(key: String, value: String)
val newDF = df.withColumn("setting", explode($"settings"))
.select($"id", from_json($"setting" Encoders.product[Setting].schema) as 'settings)
给了我:
+------+------------------------------+
|ID |settings |
+------+------------------------------+
|1 |[key1,val1] |
|1 |[key2,val2] |
|1 |[key3,val3] |
+------+------------------------------+
从这里我可以通过这样的settings.key使用指定的行 但它不是我需要的。我需要访问一行数据中的多个键
答案 0 :(得分:5)
你差不多了,如果你已经有了这个
+------+------------------------------+
|ID |settings |
+------+------------------------------+
|1 |[key1,val1] |
|1 |[key2,val2] |
|1 |[key3,val3] |
+------+------------------------------+
现在您可以使用pivot来重塑数据
newDF.groupBy($"ID")
.pivot("settings.key")
.agg(first("settings.value"))
按ID分组并使用pivot
,使用agg
获取first
值,但您可以在此使用任何其他function
。
输出:
+---+----+----+----+
|ID |key1|key2|key3|
+---+----+----+----+
|1 |val1|val2|val3|
+---+----+----+----+
希望这有帮助!