我有一个带有以下架构的json文件:
root
|-- demo: boolean (nullable = true)
|-- person: struct (nullable = true)
| |-- dateOfBirth: string (nullable = true)
| |-- email: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- emergencyContacts: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- name: string (nullable = true)
| | | |-- phone: string (nullable = true)
| | | |-- relationship: string (nullable = true)
| |-- id: long (nullable = true)
| |-- name: string (nullable = true)
| |-- phones: struct (nullable = true)
| | |-- home: string (nullable = true)
| | |-- mobile: string (nullable = true)
| |-- registered: boolean (nullable = true)
|-- product: string (nullable = true)
|-- releaseDate: string (nullable = true)
我想解析emergencyContacts数组以获取联系人的姓名
我已经使用以下人员结构:
val df =sqlContext.read.json("file:///home/training211/test/cjson1.json").toDF();
df.registerTempTable("df");
df.printSchema();
val person = df.select("person");
person.registerTempTable("person");
person.printSchema();
person.show();
如果我想要更进一步,它总会给出一个错误: org.apache.spark.sql.AnalysisException:无法解析' persons.emergencyContact s'给定输入栏:[person];
也尝试过:
val arrayFlatten = df.select($"person.emergencyContacts".getItem(0))
给了我
+---------------------------+
|person.emergencyContacts[0]|
+---------------------------+
| [Jane Doe,888-555...|
+---------------------------+
但这不是我想要的结果
感谢任何帮助
答案 0 :(得分:0)
你能尝试以下吗?
df.select($"person.emergencyContacts").show
如果你想获得phone
,你可以这样做。
df.select($"person.emergencyContacts.phone").show
或者您可以迭代emegencyContacts数组以获取电话和姓名详细信息。寻找Scala数组迭代。