我在数据框中有一个包含数组的列。我想将此数组转换为列
>>> from pyspark.sql import Row
>>> from pyspark.mllib.linalg import DenseVector
>>> df = spark.createDataFrame([Row(a=1, intlist=DenseVector([1,2,3])), Row(a=2, intlist=DenseVector([4,5,6]))])
>>> df.show()
+---+-------------+
| a| intlist|
+---+-------------+
| 1|[1.0,2.0,3.0]|
| 2|[4.0,5.0,6.0]|
+---+-------------+
预期产出:
+---+---+---+---+
| a| _1| _2| _3|
+---+---+---+---+
| 1| 1| 2| 3|
| 2| 4| 5| 6|
+---+---+---+---+
explode
函数可以执行此操作,但它会添加行而不是列
>>> df.select(explode(df.intlist).alias("anInt")).show()
+-----+
|anInt|
+-----+
| 1|
| 2|
| 3|
| 4|
| 5|
| 6|
+-----+
有没有办法可以添加列而不是行?