将行追加到数据框

时间:2020-01-08 11:01:36

标签: scala apache-spark apache-spark-sql

我想实现以下一个数据帧。我想继续将新行追加到数据框,如以下示例所示。

for(a<- value)
        { 
         val num = a
         val count = a+10
         //creating a df with the above values//
         val data = Seq((num.asInstanceOf[Double], count.asInstanceOf[Double]))
         val row = spark.sparkContext.parallelize(data).toDF("Number","count")
         val data2 =  data1.union(row)
         val data1 = data2 --> currently this assignment is not possible.
         }

我也尝试过

for(a<- value)
        { 
         val num = a
         val count = a+10
         //creating a df with the above values//
         val data = Seq((num.asInstanceOf[Double], count.asInstanceOf[Double]))
         val row = spark.sparkContext.parallelize(data).toDF("Number","count")
         val data1 =  data1.union(row) --> Union with self is not possible
         }

如何在火花中实现这一目标。

3 个答案:

答案 0 :(得分:1)

数据帧是不可变的,您将需要使用可变结构。这是可能对您有帮助的解决方案。

yearSuffix

答案 1 :(得分:0)

您的data1必须声明为var

var data1:DataFrame = ???

for(a<- value)
{
  val num = a
  val count = a+10
  //creating a df with the above values//
  val data = Seq((num.toDouble, count.toDouble))
  val row = spark.sparkContext.parallelize(data).toDF("Number","count")
  val data2 =  data1.union(row)
  data1 = data2 
}

但是我不建议这样做,最好将整个value(必须是Seq?)转换为数据框,然后再合并一次。许多工会往往效率低下。...

val newDF = value.toDF("Number")
  .withColumn("count",$"Number" + 10)

val result= data1.union(newDF)

答案 2 :(得分:0)

只需使用for循环创建一个DataFrame,然后像这样与data1合并:

val df = ( for(a <- values) yield (a, a+10) ).toDF("Number", "count")
val result = data1.union(df)

这比在for循环内进行联合要有效得多。