Question

我已经解析了具有 DocumentData ， RetailData 和 _Cbs 列的Parque数据。 _Cbs是一个字符串列，但是 DocumentData 数据在结构内部非常复杂。由于它变得非常复杂，我无法使其爆炸。架构如下：

 dataFrame.printSchema()

 root
  |-- DocumentData: struct (nullable = true)
  |    |-- Document: struct (nullable = true)
  |    |    |-- Cbs:CreationDt: string (nullable = true)
  |    |    |-- Cbs:DataClassification: struct (nullable = true)
  |    |    |    |-- Abs:BusinessSensitivityLevel: struct (nullable = true)
  |    |    |    |    |-- Cbs:Code: string (nullable = true)
  |    |    |    |-- Cbs:DataClassificationLevel: struct (nullable = true)
  |    |    |    |    |-- Cbs:Code: long (nullable = true)
  |    |    |    |    |-- Cbs:Description: string (nullable = true)
  |    |    |    |-- Cbs:PCIdataInd: string (nullable = true)
  |    |    |    |-- Cbs:PHIdataInd: string (nullable = true)
  |    |    |    |-- Cbs:PPIdataInd: string (nullable = true)
  |    |    |-- Cbs:DocumentNm: string (nullable = true)
  |    |    |-- Cbs:GatewayNm: string (nullable = true)
  |    |    |-- Cbs:InboundOutboundInd: string (nullable = true)
  |    |    |-- Cbs:InternalFileTransferInd: string (nullable = true)
  |    |    |-- Cbs:SourceApplicationCd: string (nullable = true)
  |    |    |-- Cbs:TargetApplicationCd: string (nullable = true)
  |    |-- DocumentAction: struct (nullable = true)
  |    |    |-- Cbs:ActionTypeCd: string (nullable = true)
  |    |    |-- Cbs:RecordTypeCd: string (nullable = true)

我正在使用像这样的spark语句进行探索，并尝试了几次修改以实际爆炸，但是没有任何效果。

  dataFrame.select("_Cbs",explode("DocumentData.*")).show()
  dataFrame.select("_Cbs","DocumentData.*").show()

稍后的语句只是扩展了该列，但似乎并没有使事情变得平坦。我希望每个字段都作为蜂巢表中的单独列。我只想弄平结构，因为将数据框转换为表是没有问题的。如何扁平化上面的结构？仅举一个例子就可以了。谢谢。

如何扁平化火花中的结构？

0 个答案: