从IndexedSeq [DataFrame]转换为DataFrame?

时间:2017-01-18 18:22:04

标签: scala apache-spark-sql

新手问题, 我尝试将列添加到现有DataFrame,我正在使用Spark 1.4.1

import sqlContext.implicits._
case class Test(rule: Int)

val test = sc.parallelize((1 to 2).map(i => Test(i-i))).toDF
test.registerTempTable("test")
test.show

+----+
|rule|
+----+
|   0|
|   0|
+----+

然后 - 添加列,一列 - 确定

import org.apache.spark.sql.functions.lit
val t1 = test.withColumn("1",lit(0) )
t1.show

+----+-+
|rule|1|
+----+-+
|   0|0|
|   0|0|
+----+-+

当我尝试添加多个列时出现问题:

val t1 = (1 to 5).map( i => test.withColumn(i,lit(i) ))
t1.show()

error: value show is not a member of scala.collection.immutable.IndexedSeq[org.apache.spark.sql.DataFrame]

1 个答案:

答案 0 :(得分:1)

您需要 reduce 进程,因此您可以将 foldLeft test 数据一起使用,而不是使用 map frame作为您的初始参数:

val t1 = (1 to 5).foldLeft(test){ case(df, i) => df.withColumn(i.toString, lit(i))}

t1.show
+----+---+---+---+---+---+
|rule|  1|  2|  3|  4|  5|
+----+---+---+---+---+---+
|   0|  1|  2|  3|  4|  5|
|   0|  1|  2|  3|  4|  5|
+----+---+---+---+---+---+