我有一个带有两个数组列的数据框,
+---------+-----------------------+
|itemval |fruit |
+---------+-----------------------+
|[1, 2, 3]|[apple, banana, orange]|
+---------+-----------------------+
我正在尝试压缩它们并创建一个名称/值对
+---------+-----------------------+--------------------------------------+
|itemval |fruit |ziped |
+---------+-----------------------+--------------------------------------+
|[1, 2, 3]|[apple, banana, orange]|[[1, apple], [2, banana], [3, orange]]|
+---------+-----------------------+--------------------------------------+
然后将其转换为JSON,to_json输出的格式如下
+---------------------------------------------------------------------------+
|ziped |
+---------------------------------------------------------------------------+
|[{"_1":"1","_2":"apple"},{"_1":"2","_2":"banana"},{"_1":"3","_2":"orange"}]|
+---------------------------------------------------------------------------+
我期望的格式是这样
+---------------------------------------------------------------------------+
|ziped |
+---------------------------------------------------------------------------+
|[{"itemval":"1","name":"apple"},{"itemval":"2","name":"banana"},{"itemval":"3","name":"orange"}]|
+---------------------------------------------------------------------------+
这是我的实现方式
val df1 = Seq((Array(1,2,3),Array("apple","banana","orange"))).toDF("itemval","fruit")
df1.show(false)
def zipper=udf((list1:Seq[String],list2:Seq[String]) => {
val zipList = list2 zip list1
zipList
)
df1.withColumn("ziped",to_json(zipper($"fruit",$"itemval"))).drop("itemval","fruit").show(false)
答案 0 :(得分:0)
这是为我工作的解决方案。创建具有新值的架构并将其强制转换为列
val schema = ArrayType(
StructType(
Array(
StructField("itemval",StringType),
StructField("name",StringType)
)
)
)
val casted =zival.withColumn("result",$"ziped".cast(schema))
casted.show(false)
casted.select(to_json($"result")).show(false)
输出将是
casted:org.apache.spark.sql.DataFrame
ziped:array
element:struct
_1:string
_2:string
result:array
element:struct
itemval:string
name:string
+-----------------------------------------------------------------+
|structstojson(result) |
+-----------------------------------------------------------------+
|[{"itemval":"3","name":"orange"},{"itemval":"2","name":"banana"}]|
+-----------------------------------------------------------------+