答案 0 :(得分:2)
您可以通过创建行的RDD,创建模式并将其用于将RDD转换为数据帧来实现。
// A seq of seqs
val s = Seq(1 to 6, 1 to 6, 1 to 6)
// Let's create a RDD of Rows
val rdd = sc.parallelize(s).map(Row.fromSeq)
// Let's define a schema based on the first seq of s
val schema = StructType(
(1 to s(0).size).map(i => StructField("c"+i, IntegerType, true))
)
// And let's finally create the dataframe
val df = spark.createDataFrame(rdd, schema)
df.show
// +---+---+---+---+---+---+
// | c1| c2| c3| c4| c5| c6|
// +---+---+---+---+---+---+
// | 1| 2| 3| 4| 5| 6|
// | 1| 2| 3| 4| 5| 6|
// | 1| 2| 3| 4| 5| 6|
// +---+---+---+---+---+---+
答案 1 :(得分:1)
如果您有问题中提到的数据框,且数组列为
root
|-- features: array (nullable = true)
| |-- element: integer (containsNull = false)
然后您可以使用以下逻辑
val finalCols = Array("c1", "c2", "c3", "c4", "c5", "c6", "c7")
import org.apache.spark.sql.functions._
finalCols.zipWithIndex.foldLeft(df){(tempdf, c) => tempdf.withColumn(c._1, col("features")(c._2))}.select(finalCols.map(col): _*).show(false)
应该给您
+---+---+---+---+---+---+---+
|c1 |c2 |c3 |c4 |c5 |c6 |c7 |
+---+---+---+---+---+---+---+
|0 |45 |63 |0 |0 |0 |0 |
|0 |0 |0 |85 |0 |69 |0 |
|0 |89 |56 |0 |0 |0 |0 |
+---+---+---+---+---+---+---+
或者您可以将udf函数用作
import org.apache.spark.sql.functions._
def splitArrayUdf = udf((features: Seq[Int]) => testCaseClass(features(0), features(1), features(2), features(3), features(4), features(5), features(6)))
df.select(splitArrayUdf(col("features")).as("features")).select(col("features.*")).show(false)
应该为您提供相同的结果
我希望答案会有所帮助