我有一个看起来像这样的JSON
{"name":"Michael", "cities":["palo alto", "menlo park"], "schools":[{"sname":"stanford", "year":2010}, {"sname":"berkeley","year":2012}]}
我想将输出存储在csv文件中,如下所示:
Michael,{"sname":"stanford", "year":2010}
Michael,{"sname":"berkeley", "year":2012}
我尝试了以下内容:
val people = sqlContext.read.json("people.json")
val flattened = people.select($"name", explode($"schools").as("schools_flat"))
以上代码不会将schools_flat作为json。 关于如何获得预期输出的任何想法。
由于
答案 0 :(得分:0)
您需要显式指定架构以便以所需方式读取json文件。 在这种情况下,它将是这样的:
import org.apache.spark.sql.catalyst.ScalaReflection
import org.apache.spark.sql.types.StructType
case class json_schema_class( cities: String, name : String, schools: Array[String])
var json_schema = ScalaReflection.schemaFor[json_schema_class].dataType.asInstanceOf[StructType]
var people = sqlContext.read.schema( json_schema ).json("people.json")
var flattened = people.select($"name", explode($"schools").as("schools_flat"))
'flattened'数据框如下:
+-------+--------------------+
| name| schools_flat|
+-------+--------------------+
|Michael|{"sname":"stanfor...|
|Michael|{"sname":"berkele...|
+-------+--------------------+