我有这个模式,我想将结果的内部拆分为列,以便有col1:EventCode,col2:Message等...我正在使用Pyspark,我尝试了爆炸功能,但它没有'似乎在structType上工作,有没有办法在Spark中做到这一点?
root
|-- result: struct (nullable = true)
| |-- EventCode: string (nullable = true)
| |-- Message: string (nullable = true)
| |-- _bkt: string (nullable = true)
| |-- _cd: string (nullable = true)
| |-- _indextime: string (nullable = true)
| |-- _pre_msg: string (nullable = true)
| |-- _raw: string (nullable = true)
| |-- _serial: string (nullable = true)
| |-- _si: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- _sourcetype: string (nullable = true)
| |-- _time: string (nullable = true)
| |-- host: string (nullable = true)
| |-- index: string (nullable = true)
| |-- linecount: string (nullable = true)
| |-- source: string (nullable = true)
| |-- sourcetype: string (nullable = true)
答案 0 :(得分:1)
将数据行划分为简单行很容易。您所要做的就是从数据框中选择所有列并将其分配给另一个数据帧。像这样:
simpleDF = df.select("result.*")
它会将上面给出的模式转换为以下模式:
simpleDF.printSchema
root
|-- EventCode: string (nullable = true)
|-- Message: string (nullable = true)
|-- _bkt: string (nullable = true)
|-- _cd: string (nullable = true)
|-- _indextime: string (nullable = true)
|-- _pre_msg: string (nullable = true)
|-- _raw: string (nullable = true)
|-- _serial: string (nullable = true)
|-- _si: array (nullable = true)
| |-- element: string (containsNull = true)
|-- _sourcetype: string (nullable = true)
|-- _time: string (nullable = true)
|-- host: string (nullable = true)
|-- index: string (nullable = true)
|-- linecount: string (nullable = true)
|-- source: string (nullable = true)
|-- sourcetype: string (nullable = true)