我有3-5个JSON文件,其中包含部门,学生,课程等各种信息。如果有人搜索特定部门名称的关键字,我需要显示部门JSON文件,学生详细信息(来自文件)中的相关数据2),属于那个部门的人。
如何使用Spark / Scala实现此目标?
到目前为止我尝试过的是: 读取数据作为数据框(每个JSON文件)
但是我对于如何在每个DF中搜索输入的数据感到困惑。按键和/或值搜索将是有效的方法吗?还有其他方法来查找已解析的信息吗?
JSON 1:DEPT
[{
"_id": 101,
"name": "Computer",
"Tags": ["B.Tech", "M.Tech", "BCA"],
"details": "CollegeDept 1"
}, {
"_id": 102,
"name": "Mechanical",
"domain_names": ["B.Tech", "M.Tech"],
"details": "CollegeDept 2"
}
…
…
]
JSON 2:学生
[{
"_id": 1,
"name": "Adam Smith",
"enroll_dt": "2016-04-15T05:19:46 -10:00",
"active": true,
"last_login_at": "2013-08-04T01:03:27",
"phone": "111-222-888",
"signature": "Don't Worry Be Happy!",
"dept_id": 101,
"tags": [
"B.Tech",
"Computer"
]
},
{
"_id": 2,
"name": "Ron Dey",
"enroll_dt": "2016-04-15T05:19:46 -10:00",
"active": true,
"last_login_at": "2013-08-04T01:03:27",
"phone": "7080-656-878",
"signature": "Don't Worry Be Happy!",
"dept_id": 102,
"tags": [
"M.Tech",
"Mechanical"
]
}
]
如果搜索的是关键字“机械”,则其输出应为
"_id": 102,
"name": "Mechanical",
"domain_names": ["B.Tech", "M.Tech"],
"details": "CollegeDept 2"
来自JSON 1
和
Student 2 : Ron Dey(and its details )
来自JSON 2
预先感谢