我们如何通过SPARK中的嵌套结构进行爆炸

时间:2017-05-09 18:29:03

标签: json apache-spark spark-dataframe hadoop2 pyspark-sql

我正在尝试找到一种在模式下爆炸并从每个计数器下面检索的方法。

jobid & groups->element->counts->element->displayname, value

`

 |-- event: struct (nullable = true)
 |    |-- org.apache.hadoop.mapreduce.jobhistory.JobFinished: struct (nullable = true)
 |    |    |-- failedMaps: long (nullable = true)
 |    |    |-- failedReduces: long (nullable = true)
 |    |    |-- finishTime: long (nullable = true)
 |    |    |-- finishedMaps: long (nullable = true)
 |    |    |-- finishedReduces: long (nullable = true)
 |    |    |-- jobid: string (nullable = true)
 |    |    |-- mapCounters: struct (nullable = true)
 |    |    |    |-- groups: array (nullable = true)
 |    |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |    |-- counts: array (nullable = true)
 |    |    |    |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |    |    |    |-- displayName: string (nullable = true)
 |    |    |    |    |    |    |    |-- name: string (nullable = true)
 |    |    |    |    |    |    |    |-- value: long (nullable = true)
 |    |    |    |    |    |-- displayName: string (nullable = true)
 |    |    |    |    |    |-- name: string (nullable = true)
 |    |    |    |-- name: string (nullable = true)
 |    |    |-- reduceCounters: struct (nullable = true)
 |    |    |    |-- groups: array (nullable = true)
 |    |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |    |-- counts: array (nullable = true)
 |    |    |    |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |    |    |    |-- displayName: string (nullable = true)
 |    |    |    |    |    |    |    |-- name: string (nullable = true)
 |    |    |    |    |    |    |    |-- value: long (nullable = true)
 |    |    |    |    |    |-- displayName: string (nullable = true)
 |    |    |    |    |    |-- name: string (nullable = true)
 |    |    |    |-- name: string (nullable = true)
 |    |    |-- totalCounters: struct (nullable = true)
 |    |    |    |-- groups: array (nullable = true)
 |    |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |    |-- counts: array (nullable = true)
 |    |    |    |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |    |    |    |-- displayName: string (nullable = true)
 |    |    |    |    |    |    |    |-- name: string (nullable = true)
 |    |    |    |    |    |    |    |-- value: long (nullable = true)
 |    |    |    |    |    |-- displayName: string (nullable = true)
 |    |    |    |    |    |-- name: string (nullable = true)
 |    |    |    |-- name: string (nullable = true)
 |-- type: string (nullable = true)


scala> val df = sqlContext.read.json("hdfs:///xyz/jfinished.json")
var explodeDF3 = df.withColumn("org.apache.hadoop.mapreduce.jobhistory.JobFinished",df("event.org.apache.hadoop.mapreduce.jobhistory.JobFinished"))
  

org.apache.spark.sql.AnalysisException:没有这样的struct field org   org.apache.hadoop.mapreduce.jobhistory.JobFinished;

**org.apache.hadoop.mapreduce.jobhistory.JobFinished is a sruct**

但抛出错误。有办法吗?

0 个答案:

没有答案