Spark SQL JSON布尔评估

时间:2015-05-21 20:45:46

标签: apache-spark pyspark

我有一个示例JSON Schema(由于大小而中断):

 |-- LinearScheduleResult: struct (nullable = true)
 |    |-- Build: string (nullable = true)
 |    |-- EndTimestamp: string (nullable = true)
 |    |-- Errors: array (nullable = true)
 |    |    |-- element: string (containsNull = true)
 |    |-- RequestId: string (nullable = true)
 |    |-- Schedule: struct (nullable = true)
 |    |    |-- Airings: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- AiringTime: string (nullable = true)
 |    |    |    |    |-- AiringType: string (nullable = true)
 |    |    |    |    |-- CC: boolean (nullable = true)
 |    |    |    |    |-- CallLetters: string (nullable = true)
 |    |    |    |    |-- Category: string (nullable = true)
 |    |    |    |    |-- Channel: string (nullable = true)
 |    |    |    |    |-- Color: string (nullable = true)
 |    |    |    |    |-- Copy: string (nullable = true)
 |    |    |    |    |-- DSS: boolean (nullable = true)
 |    |    |    |    |-- DVS: boolean (nullable = true)
 |    |    |    |    |-- Dolby: boolean (nullable = true)
 |    |    |    |    |-- Duration: long (nullable = true)
 |    |    |    |    |-- DvbTriplet: string (nullable = true)
 |    |    |    |    |-- EpisodeTitle: string (nullable = true)
 |    |    |    |    |-- HD: boolean (nullable = true)
 |    |    |    |    |-- HDLevel: string (nullable = true)
 |    |    |    |    |-- IconAvailable: boolean (nullable = true)
 |    |    |    |    |-- InstanceId: string (nullable = true)
 |    |    |    |    |-- LetterBox: boolean (nullable = true)
 |    |    |    |    |-- MovieRating: string (nullable = true)
 |    |    |    |    |-- ParentNetworkId: long (nullable = true)
 |    |    |    |    |-- ProgramId: string (nullable = true)
 |    |    |    |    |-- SAP: boolean (nullable = true)
 |    |    |    |    |-- SL: string (nullable = true)
 |    |    |    |    |-- SeriesId: string (nullable = true)
 |    |    |    |    |-- ServiceId: long (nullable = true)
 |    |    |    |    |-- ShowingType: string (nullable = true)
 |    |    |    |    |-- SourceDisplayName: string (nullable = true)
 |    |    |    |    |-- SourceId: long (nullable = true)
 |    |    |    |    |-- SourceLongName: string (nullable = true)
 |    |    |    |    |-- Sports: boolean (nullable = true)

当我执行以下操作时:

results = sqlContext.sql("SELECT LinearScheduleResult.Schedule.Airings.Sports from tv")

它返回:

[Row(Sports=[False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False])]

当我做更复杂的事情时:

results = sqlContext.sql("SELECT LinearScheduleResult.Schedule.Airings from tv where LinearScheduleResult.Schedule.Airings.Sports = 'False'")

它永远不会返回任何东西,我尝试过'false',false,0,FALSE以及更多组合。

任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:1)

Airings是一个数组,你需要首先爆炸该行。类似的东西:

select a from tv 
  lateral view explode(LinearScheduleResult.Schedule.Airings) a as a 
  where a.Sports = false

你必须使用HiveSqlContext。

请参阅https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView