将带有模式的spark Dataframe转换为json String的数据框

时间:2018-01-14 18:05:56

标签: json scala apache-spark spark-dataframe

我有一个像这样的数据帧:

after

我的df的架构如下所示:

z-index

我想将每一行转换为知道我的架构的json字符串。所以这个数据帧将有一个包含json的列字符串。 第一行应该是这样的:

before

并且数据框的secone行应该是这样的:

z-index

我的目标不是将数据帧写入json文件。我的目标是将df1转换为第二个df2,以便将df2的每个json行推送到kafka主题 我有这个代码来创建数据帧:

after
你知道吗?

2 个答案:

答案 0 :(得分:2)

如果您只需要一个单列DataFrame / Dataset,每个列值代表JSON中原始DataFrame的每一行,您只需将toJSON应用于您的DataFrame,如下所示:

df.show
// +---+------------------------------+---+--------+------+-------------+
// |age|creditcards                   |id |lastname|name  |timestamp    |
// +---+------------------------------+---+--------+------+-------------+
// |35 |[[hr6,3569823], [ee3,1547869]]|1  |blanc   |michel|1496756626921|
// |25 |[[ye8,4569872], [qe5,3485762]]|2  |barns   |peter |1496756626551|
// +---+------------------------------+---+--------+------+-------------+

val dsJson = df.toJSON
// dsJson: org.apache.spark.sql.Dataset[String] = [value: string]

dsJson.show
// +--------------------------------------------------------------------------+
// |value                                                                     |
// +--------------------------------------------------------------------------+
// |{"age":"35","creditcards":[{"id":"hr6","number":"3569823"},{"id":"ee3",...|
// |{"age":"25","creditcards":[{"id":"ye8","number":"4569872"},{"id":"qe5",...|
// +--------------------------------------------------------------------------+

[UPDATE]

要添加name作为附加列,您可以使用from_json从JSON列中提取它:

val result = dsJson.withColumn("name", from_json($"value", df.schema)("name"))

result.show
// +--------------------+------+
// |               value|  name|
// +--------------------+------+
// |{"age":"35","cred...|michel|
// |{"age":"25","cred...| peter|
// +--------------------+------+

答案 1 :(得分:1)

为此,您可以使用

直接将数据帧转换为JSON字符串数据集
val jsonDataset: Dataset[String] = df.toJSON

您可以使用

将其转换为数据框
val jsonDF: DataFrame = jsonDataset.toDF

这里的json将按字母顺序排列,以便输出

jsonDF show false

将是

    +--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    |value                                                                                                                                                               |
    +--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    |{"age":"35","creditcards":[{"id":"hr6","number":"3569823"},{"id":"ee3","number":"1547869"}],"id":"1","lastname":"blanc","name":"michel","timestamp":"1496756626921"}|
    |{"age":"25","creditcards":[{"id":"ye8","number":"4569872"},{"id":"qe5","number":"3485762"}],"id":"2","lastname":"barns","name":"peter","timestamp":"1496756626551"} |
    +--------------------------------------------------------------------------------------------------------------------------------------------------------------------+