我有一个数据框
import os, sys
import json, time, random, string, requests
import pyodbc
from pyspark import SparkConf, SparkContext, SQLContext
from pyspark.sql.functions import explode, col, from_json, lit
from pyspark.sql import functions as f
from pyspark.sql import SparkSession
from pyspark.sql.types import *
...
df = data.withColumn("dev_serial", col("data.dev_serial")) \
.withColumn("dev_property", from_json(col("data.dev_property"), MapType(StringType(), StringType())) )\
.drop("data")
df.show(truncate=False)
df.printSchema()
结果在这里
我想爆炸dev_property(column)
dev_serial / use_event / item / ...
value1 / value2 / value3 /value4
.
.
.
如何爆炸?
答案 0 :(得分:1)
当您要将dev_property列分解为两列时,此脚本会有所帮助:
df2 = df.select(df.dev_serial, explode(df.dev_property))
df2.printSchema()
df2.show()
Read more关于爆炸如何作用于Array和Map类型。