如何解析嵌套在spark中的Json对象

时间:2016-11-03 11:09:04

标签: json apache-spark struct apache-spark-sql

我有一个带有以下架构的json文件:

root
 |-- demo: boolean (nullable = true)
 |-- person: struct (nullable = true)
 |    |-- dateOfBirth: string (nullable = true)
 |    |-- email: array (nullable = true)
 |    |    |-- element: string (containsNull = true)
 |    |-- emergencyContacts: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- name: string (nullable = true)
 |    |    |    |-- phone: string (nullable = true)
 |    |    |    |-- relationship: string (nullable = true)
 |    |-- id: long (nullable = true)
 |    |-- name: string (nullable = true)
 |    |-- phones: struct (nullable = true)
 |    |    |-- home: string (nullable = true)
 |    |    |-- mobile: string (nullable = true)
 |    |-- registered: boolean (nullable = true)
 |-- product: string (nullable = true)
 |-- releaseDate: string (nullable = true)

我想解析emergencyContacts数组以获取联系人的姓名

我已经使用以下人员结构:

val df =sqlContext.read.json("file:///home/training211/test/cjson1.json").toDF();
df.registerTempTable("df");
df.printSchema();
val person = df.select("person");
person.registerTempTable("person");
person.printSchema();
person.show();

如果我想要更进一步,它总会给出一个错误: org.apache.spark.sql.AnalysisException:无法解析' persons.emergencyContact s'给定输入栏:[person];

也尝试过:

val arrayFlatten = df.select($"person.emergencyContacts".getItem(0)) 

给了我

+---------------------------+
|person.emergencyContacts[0]|
+---------------------------+
|       [Jane Doe,888-555...|
+---------------------------+

但这不是我想要的结果

感谢任何帮助

1 个答案:

答案 0 :(得分:0)

你能尝试以下吗?

df.select($"person.emergencyContacts").show

如果你想获得phone,你可以这样做。

df.select($"person.emergencyContacts.phone").show

或者您可以迭代emegencyContacts数组以获取电话和姓名详细信息。寻找Scala数组迭代。