我是Spark和Scala的新手,并且正在尝试为我的一个学习项目学习spark。我有一个看起来像这样的JSON文件:
[
{
"year": 2012,
"month": 8,
"title": "Batman"
},
{
"year": 2012,
"month": 8,
"title": "Hero"
},
{
"year": 2012,
"month": 7,
"title": "Robot"
}
]
我开始阅读此json以触发DataFrame文件,因此我尝试了以下操作:
spark.read
.option("multiline", true)
.option("mode", "PERMISSIVE")
.option("inferSchema", true)
.json(filePath)
它读取JSON,但将数据转换为spark列。我的要求是将每个数据对象读取为一个单独的列。
我想将其读取到一个Spark DataFrame中,期望在其中输出如下所示:
+----------------------------------------+
|json |
+----------------------------------------+
|{"year":2012,"month":8,"title":"Batman"}|
|{"year":2012,"month":8,"title":"Hero"} |
|{"year":2012,"month":7,"title":"Robot"} |
|{"year":2011,"month":7,"title":"Git"} |
+----------------------------------------+
答案 0 :(得分:0)
使用toJSON
val df = spark.read
.option("multiline", true)
.option("mode", "PERMISSIVE")
.option("inferSchema", true)
.json(filePath).toJSON
现在
df.show(false)
+----------------------------------------+
|value |
+----------------------------------------+
|{"month":8,"title":"Batman","year":2012}|
|{"month":8,"title":"Hero","year":2012} |
|{"month":7,"title":"Robot","year":2012} |
+----------------------------------------+