我在dataFrame:json1
中有一个JSON Array列:df2
val df1 = Seq(
(1, 11, "n1", "d1"),
(2, 11, "n3", "d3")
).toDF("id1", "id2", "number", "data")
val df2 = df1.withColumn("json", to_json(struct($"number", $"data"))).groupBy("id1", "id2").agg(collect_list($"json").alias("json1"))
df2的内容是
+---+---+-----------------------------+
|id1|id2|json1 |
+---+---+-----------------------------+
|1 |11 |[{"number":"n1","data":"d1"}]|
|2 |11 |[{"number":"n3","data":"d3"}]|
+---+---+-----------------------------+
我正在尝试创建另一个JSON,例如json2
,将密钥作为传递的字符串,例如key1
,并将值作为json1的数据
+---+---+--------------------------------------+
|id1|id2|json2 |
+---+---+--------------------------------------+
|1 |11 |{"key1":[{"number":"n1","data":"d1"}]}|
|2 |11 |{"key1":[{"number":"n3","data":"d3"}]}|
+---+---+--------------------------------------+
要实现相同目的,我尝试使用 lit 和 concat 方法
val df3 = df2.withColumn("json2", concat(lit("{" + "\"key1\"" + ":"), col("json1") ,lit("}")))
,但是在运行它时,我得到了AnalysisException。
org.apache.spark.sql.AnalysisException: cannot resolve 'concat('{"key1":', `json1`, '}')' due to data type mismatch: argument 2 requires string type, however, '`json1`' is of array<string> type.;;
版本:
Spark: 2.2
Scala: 2.11