这是我转换为Df的Json部分
{"business_id": "vcNAWiLM4dR7D2nwwJ7nCA", "full_address": "4840 E Indian School Rd\nSte 101\nPhoenix, AZ 85018", "hours": {"Tuesday": {"close": "17:00", "open": "08:00"}, "Friday": {"close": "17:00", "open": "08:00"}, "Monday": {"close": "17:00", "open": "08:00"}, "Wednesday": {"close": "17:00", "open": "08:00"}, "Thursday": {"close": "17:00", "open": "08:00"}}, "open": true, "categories": ["Doctors", "Health & Medical"], "city": "Phoenix", "review_count": 9, "name": "Eric Goldberg, MD", "neighborhoods": [], "longitude": -111.98375799999999, "state": "AZ", "stars": 3.5, "latitude": 33.499313000000001, "attributes": {"By Appointment Only": true}, "type": "business"}
...等等
现在我只需要显示星期二所有业务的开张和关闭时间。我在isin
的帮助下,以filter
为条件进行了尝试。但这没有用..任何人都可以指导我。>
答案 0 :(得分:1)
如果使用此架构正确加载了数据框(我用spark.read.json
加载了示例):
scala> df.printSchema
root
|-- attributes: struct (nullable = true)
| |-- By Appointment Only: boolean (nullable = true)
|-- business_id: string (nullable = true)
|-- categories: array (nullable = true)
| |-- element: string (containsNull = true)
|-- city: string (nullable = true)
|-- full_address: string (nullable = true)
|-- hours: struct (nullable = true)
| |-- Friday: struct (nullable = true)
| | |-- close: string (nullable = true)
| | |-- open: string (nullable = true)
| |-- Monday: struct (nullable = true)
| | |-- close: string (nullable = true)
| | |-- open: string (nullable = true)
| |-- Thursday: struct (nullable = true)
| | |-- close: string (nullable = true)
| | |-- open: string (nullable = true)
| |-- Tuesday: struct (nullable = true)
| | |-- close: string (nullable = true)
| | |-- open: string (nullable = true)
| |-- Wednesday: struct (nullable = true)
| | |-- close: string (nullable = true)
| | |-- open: string (nullable = true)
|-- latitude: double (nullable = true)
|-- longitude: double (nullable = true)
|-- name: string (nullable = true)
|-- neighborhoods: array (nullable = true)
| |-- element: string (containsNull = true)
|-- open: boolean (nullable = true)
|-- review_count: long (nullable = true)
|-- stars: double (nullable = true)
|-- state: string (nullable = true)
|-- type: string (nullable = true)
你可以做
scala> df.select("hours.Tuesday").show
+--------------+
| Tuesday|
+--------------+
|[17:00, 08:00]|
+--------------+
如果您只想休假,可以
scala> df.select("hours.Tuesday.close").show
+-----+
|close|
+-----+
|17:00|
+-----+