在Spark sql中显示特定值

时间:2019-03-27 18:07:50

标签: apache-spark apache-spark-sql

这是我转换为Df的Json部分

{"business_id": "vcNAWiLM4dR7D2nwwJ7nCA", "full_address": "4840 E Indian School Rd\nSte 101\nPhoenix, AZ 85018", "hours": {"Tuesday": {"close": "17:00", "open": "08:00"}, "Friday": {"close": "17:00", "open": "08:00"}, "Monday": {"close": "17:00", "open": "08:00"}, "Wednesday": {"close": "17:00", "open": "08:00"}, "Thursday": {"close": "17:00", "open": "08:00"}}, "open": true, "categories": ["Doctors", "Health & Medical"], "city": "Phoenix", "review_count": 9, "name": "Eric Goldberg, MD", "neighborhoods": [], "longitude": -111.98375799999999, "state": "AZ", "stars": 3.5, "latitude": 33.499313000000001, "attributes": {"By Appointment Only": true}, "type": "business"} ...等等

现在我只需要显示星期二所有业务的开张和关闭时间。我在isin的帮助下,以filter为条件进行了尝试。但这没有用..任何人都可以指导我。

1 个答案:

答案 0 :(得分:1)

如果使用此架构正确加载了数据框(我用spark.read.json加载了示例):

    scala> df.printSchema
    root
    |-- attributes: struct (nullable = true)
    |    |-- By Appointment Only: boolean (nullable = true)
    |-- business_id: string (nullable = true)
    |-- categories: array (nullable = true)
    |    |-- element: string (containsNull = true)
    |-- city: string (nullable = true)
    |-- full_address: string (nullable = true)
    |-- hours: struct (nullable = true)
    |    |-- Friday: struct (nullable = true)
    |    |    |-- close: string (nullable = true)
    |    |    |-- open: string (nullable = true)
    |    |-- Monday: struct (nullable = true)
    |    |    |-- close: string (nullable = true)
    |    |    |-- open: string (nullable = true)
    |    |-- Thursday: struct (nullable = true)
    |    |    |-- close: string (nullable = true)
    |    |    |-- open: string (nullable = true)
    |    |-- Tuesday: struct (nullable = true)
    |    |    |-- close: string (nullable = true)
    |    |    |-- open: string (nullable = true)
    |    |-- Wednesday: struct (nullable = true)
    |    |    |-- close: string (nullable = true)
    |    |    |-- open: string (nullable = true)
    |-- latitude: double (nullable = true)
    |-- longitude: double (nullable = true)
    |-- name: string (nullable = true)
    |-- neighborhoods: array (nullable = true)
    |    |-- element: string (containsNull = true)
    |-- open: boolean (nullable = true)
    |-- review_count: long (nullable = true)
    |-- stars: double (nullable = true)
    |-- state: string (nullable = true)
    |-- type: string (nullable = true)

你可以做

    scala> df.select("hours.Tuesday").show
    +--------------+
    |       Tuesday|
    +--------------+
    |[17:00, 08:00]|
    +--------------+

如果您只想休假,可以

scala> df.select("hours.Tuesday.close").show
+-----+
|close|
+-----+
|17:00|
+-----+