我有一个pyspark数据框
spark = SparkSession\
.builder\
.appName("NPS_TF")\
.getOrCreate()
df2 = spark.createDataFrame([
("unknown", 1,2,3)
], ["Assign", "xs[0]","xs[1]","xs[2]"])
df2.limit(1).show()
如何将部分数据框列名和第一行数据转换为这种json格式:
{"fields": ["xs[0]", "xs[1]", "xs[2]"], "values": [[1,2,3]]}
答案 0 :(得分:0)
检查此解决方案
df2 = spark.createDataFrame([("unknown", 1,2,3)], ["Assign", "xs[0]","xs[1]","xs[2]"])
fields = ['xs[0]', 'xs[1]', 'xs[2]']
values = df2.rdd.map(lambda p: [p[field] for field in fields]).collect()
json_obj = {
'fields': fields,
'values': values
}