我在pyspark工作,我希望将嵌套的JSON结构中的简单数据帧保存到MongoDB中。数据框的架构如下:
root
|-- Name: string (nullable = true)
|-- Age: integer (nullable = true)
|-- City: string (nullable = true)
|-- Contact number: int (nullable = true)
我使用DataFrameWriter
API的write方法以JSON格式保存数据帧,并使用了mongo-spark-connector包:
output_df.write.format("com.mongodb.spark.sql.DefaultSource").mode("append").save();
这非常简单。我得到数据框中每一行的MongoDB输出,如:
{
"Name": "John",
"Age": 24,
"City": "Melbourne"
"Contact number": 123456
}
{
"Name": "Wauldron",
"Age": 49,
"City": "LA"
"Contact number": 987654
}
它们是单独的文件。但是,我想将它们保存在嵌套结构中,如:
"Personal Details": [{
"Name": "John",
"Age": 24,
"City": "Melbourne"
"Contact number": 123456
},
{
"Name": "Wauldron",
"Age": 49,
"City": "LA"
"Contact number": 987654
}]
我无法让它发挥作用。帮助我。