我想知道如何将pyspark数据帧转换为json格式。
name ㅣ type
'james'ㅣ 'message'-> 4, 'text' ->3
'kane' ㅣ 'message'->2, 'text'->3
----------------------------结果------------------ --------------
将数据帧转换为json格式
data = [
{name : 'james', 'message' : 4, 'text; : 3}, {'name' : 'kane', 'message' :2, 'text' : 3}
]
如何将dataframe更改为json数据?
答案 0 :(得分:0)
试试这个-
df.show(false)
df.printSchema()
/**
* +-----+-------------------------+
* |name |type |
* +-----+-------------------------+
* |james|[message -> 4, text -> 3]|
* |kane |[message -> 2, text -> 3]|
* +-----+-------------------------+
*
* root
* |-- name: string (nullable = false)
* |-- type: map (nullable = false)
* | |-- key: string
* | |-- value: integer (valueContainsNull = false)
*/
val p = df.select(to_json(collect_list(map_concat(col("type"), map(lit("name"), $"name")))).as("data"))
p.show(false)
/**
* +------------------------------------------------------------------------------------+
* |data |
* +------------------------------------------------------------------------------------+
* |[{"message":"4","text":"3","name":"james"},{"message":"2","text":"3","name":"kane"}]|
* +------------------------------------------------------------------------------------+
*/
println(p.head().getString(0))
/**
* [{"message":"4","text":"3","name":"james"},{"message":"2","text":"3","name":"kane"}]
*/
答案 1 :(得分:0)
尝试一下。
df.withColumn('data', f.map_concat('map', f.map_from_entries(f.array(f.struct(f.lit('name'), f.col('name')))))) \
.groupBy().agg(f.collect_list('data').alias('data')) \
.withColumn('data', f.to_json(f.struct('data'))) \
.show(10, False)
+-----------------------------------------------------------------------------------------------------+
|data |
+-----------------------------------------------------------------------------------------------------+
|{"data":[{"text":"3.0","message":"3.0","name":"kane"},{"message":"4.0","text":"2.0","name":"james"}]}|
+-----------------------------------------------------------------------------------------------------+