如何在pyspark数据框中爆炸地图类型?

时间:2020-10-06 12:55:13

标签: python dataframe pyspark

我有一个数据框

import os, sys
import json, time, random, string, requests
import pyodbc 
from pyspark import SparkConf, SparkContext, SQLContext
from pyspark.sql.functions import explode, col, from_json, lit
from pyspark.sql import functions as f
from pyspark.sql import SparkSession
from pyspark.sql.types import *
...
df = data.withColumn("dev_serial", col("data.dev_serial")) \
   .withColumn("dev_property", from_json(col("data.dev_property"), MapType(StringType(), StringType())) )\
   .drop("data")
df.show(truncate=False)
df.printSchema()

结果在这里

enter image description here

我想爆炸dev_property(column)

dev_serial / use_event / item / ...
value1 / value2 / value3 /value4
.
.
.

如何爆炸?

1 个答案:

答案 0 :(得分:1)

当您要将dev_property列分解为两列时,此脚本会有所帮助:

df2 = df.select(df.dev_serial, explode(df.dev_property))
df2.printSchema()
df2.show()

Read more关于爆炸如何作用于Array和Map类型。