将JSON键添加到Column并创建新的JSON列:AnalysisException

时间:2019-02-13 12:23:57

标签: json scala apache-spark dataframe

我在dataFrame:json1中有一个JSON Array列:df2

val df1 = Seq(
  (1, 11, "n1", "d1"),
  (2, 11, "n3", "d3")
).toDF("id1", "id2", "number", "data")

val df2 = df1.withColumn("json", to_json(struct($"number", $"data"))).groupBy("id1", "id2").agg(collect_list($"json").alias("json1"))

df2的内容是

+---+---+-----------------------------+
|id1|id2|json1                        |
+---+---+-----------------------------+
|1  |11 |[{"number":"n1","data":"d1"}]|
|2  |11 |[{"number":"n3","data":"d3"}]|
+---+---+-----------------------------+

我正在尝试创建另一个JSON,例如json2,将密钥作为传递的字符串,例如key1,并将值作为json1的数据

+---+---+--------------------------------------+
|id1|id2|json2                                 |
+---+---+--------------------------------------+
|1  |11 |{"key1":[{"number":"n1","data":"d1"}]}|
|2  |11 |{"key1":[{"number":"n3","data":"d3"}]}|
+---+---+--------------------------------------+

要实现相同目的,我尝试使用 lit concat 方法

val df3 = df2.withColumn("json2", concat(lit("{" + "\"key1\"" + ":"), col("json1") ,lit("}")))

,但是在运行它时,我得到了AnalysisException。

org.apache.spark.sql.AnalysisException: cannot resolve 'concat('{"key1":', `json1`, '}')' due to data type mismatch: argument 2 requires string type, however, '`json1`' is of array<string> type.;;

版本:

Spark: 2.2
Scala: 2.11

0 个答案:

没有答案