我有一个像这样的数据帧:
after
我的df的架构如下所示:
z-index
我想将每一行转换为知道我的架构的json字符串。所以这个数据帧将有一个包含json的列字符串。 第一行应该是这样的:
before
并且数据框的secone行应该是这样的:
z-index
我的目标不是将数据帧写入json文件。我的目标是将df1转换为第二个df2,以便将df2的每个json行推送到kafka主题 我有这个代码来创建数据帧:
after
你知道吗?
答案 0 :(得分:2)
如果您只需要一个单列DataFrame / Dataset,每个列值代表JSON中原始DataFrame的每一行,您只需将toJSON
应用于您的DataFrame,如下所示:
df.show
// +---+------------------------------+---+--------+------+-------------+
// |age|creditcards |id |lastname|name |timestamp |
// +---+------------------------------+---+--------+------+-------------+
// |35 |[[hr6,3569823], [ee3,1547869]]|1 |blanc |michel|1496756626921|
// |25 |[[ye8,4569872], [qe5,3485762]]|2 |barns |peter |1496756626551|
// +---+------------------------------+---+--------+------+-------------+
val dsJson = df.toJSON
// dsJson: org.apache.spark.sql.Dataset[String] = [value: string]
dsJson.show
// +--------------------------------------------------------------------------+
// |value |
// +--------------------------------------------------------------------------+
// |{"age":"35","creditcards":[{"id":"hr6","number":"3569823"},{"id":"ee3",...|
// |{"age":"25","creditcards":[{"id":"ye8","number":"4569872"},{"id":"qe5",...|
// +--------------------------------------------------------------------------+
[UPDATE]
要添加name
作为附加列,您可以使用from_json
从JSON列中提取它:
val result = dsJson.withColumn("name", from_json($"value", df.schema)("name"))
result.show
// +--------------------+------+
// | value| name|
// +--------------------+------+
// |{"age":"35","cred...|michel|
// |{"age":"25","cred...| peter|
// +--------------------+------+
答案 1 :(得分:1)
为此,您可以使用
直接将数据帧转换为JSON字符串数据集val jsonDataset: Dataset[String] = df.toJSON
您可以使用
将其转换为数据框val jsonDF: DataFrame = jsonDataset.toDF
这里的json将按字母顺序排列,以便输出
jsonDF show false
将是
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|value |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|{"age":"35","creditcards":[{"id":"hr6","number":"3569823"},{"id":"ee3","number":"1547869"}],"id":"1","lastname":"blanc","name":"michel","timestamp":"1496756626921"}|
|{"age":"25","creditcards":[{"id":"ye8","number":"4569872"},{"id":"qe5","number":"3485762"}],"id":"2","lastname":"barns","name":"peter","timestamp":"1496756626551"} |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+