地图数组的Scala案例类

时间:2019-01-20 16:02:15

标签: scala apache-spark dictionary

我有python的背景并且只是学习scala。我想声明一个case类,用于通过spark从数据库读取数据。数据如下所示:

|id  |  person_info
+----+-------------------------------------------------------------------------------------------------------------------
| 1  |[{"fname":"john","lname":"doe","user_id":123,"dept":"hr"},{"fname":"jane","lname":"doe","user_id":456,"dept":"sales"}] 
| 2  |[{"fname":"ed","lname":"smith","user_id":345,"dept":"it"}] 

我对person_info感到困惑,因为它也有user_id: Int,这是我尝试过的:

case class Person(id: Int, person_info: Array[Map[String, String]])

person_info是通过以下方式在sql中创建的:

SELECT id, named_struct("fname", t.first_name, "lname", t.lastname, "user_id": t.userid, "dept": t.department) as person_info FROM mytable t

2 个答案:

答案 0 :(得分:3)

鉴于字段始终相同,您可以改用嵌套的case类。

final case class PersonInfoEntry(fname: String, lname: String, user_id: Int, dept: String)
final case class Person(id: Int, person_info: List[PersonInfoEntry])

答案 1 :(得分:2)

假设person_info是有效的Json对象。可以将其转换为如下所示的对象PersonDetails数组

case class PersonDetails(fname:String,lname:String,userId:Int,dept:String)
case class Person(id:Int,person_info:Array[PersonDetails])