我有一个像这样的DataFrame:
+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+
| brand| diesel| e10| e5| houseNumber| id| isOpen| lat| lng| name| place| postCode| street| Datum|
+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+
|[TOTAL, ARAL, She...|[1.049, 1.029, 1....|[1.249, 1.209, 1....|[1.269, 1.229, 1....|[49, 12-14, , , ...|[4409a024-b190-4b...|[true, true, true...|[50.93128, 50.952...|[6.962356, 6.9616...|[TOTAL KOELN, Ara...|[KOELN, Köln, KOE...|[50676, 50668, 50...|[HOLZMARKT, Riehl...|2016-08-01 10:50:...|
+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+
基本上所有列都是数组。它基于嵌套的JSON数据。
我试图爆炸它。但这只能在select语句中使用一列。你知道为什么我可以在pyspark
中一次解包所有值以保持关系吗?
答案 0 :(得分:0)
如果您首先将所有内容映射到一列,然后爆炸,然后拆分,则可能是这样。
df=sc.parallelize([[[1,2,3,4],[10, 20,30,40]]]).toDF()
F=udf(lambda x,y: [[x[i],y[i]]\
for i in range(4)],ArrayType(ArrayType(IntegerType())))
df.withColumn('merge',F(df._1,df._2))\
.select(['merge', functions.explode(col('merge'))])
首先,df输出
+------------+----------------+
| _1| _2|
+------------+----------------+
|[1, 2, 3, 4]|[10, 20, 30, 40]|
+------------+----------------+
然后最后一列是您需要拆分的内容:
+--------------------+-------+
| merge| _c0|
+--------------------+-------+
|[WrappedArray(1, ...|[1, 10]|
|[WrappedArray(1, ...|[2, 20]|
|[WrappedArray(1, ...|[3, 30]|
|[WrappedArray(1, ...|[4, 40]|
+--------------------+-------+