我想实现以下一个数据帧。我想继续将新行追加到数据框,如以下示例所示。
for(a<- value)
{
val num = a
val count = a+10
//creating a df with the above values//
val data = Seq((num.asInstanceOf[Double], count.asInstanceOf[Double]))
val row = spark.sparkContext.parallelize(data).toDF("Number","count")
val data2 = data1.union(row)
val data1 = data2 --> currently this assignment is not possible.
}
我也尝试过
for(a<- value)
{
val num = a
val count = a+10
//creating a df with the above values//
val data = Seq((num.asInstanceOf[Double], count.asInstanceOf[Double]))
val row = spark.sparkContext.parallelize(data).toDF("Number","count")
val data1 = data1.union(row) --> Union with self is not possible
}
如何在火花中实现这一目标。
答案 0 :(得分:1)
数据帧是不可变的,您将需要使用可变结构。这是可能对您有帮助的解决方案。
yearSuffix
答案 1 :(得分:0)
您的data1
必须声明为var
:
var data1:DataFrame = ???
for(a<- value)
{
val num = a
val count = a+10
//creating a df with the above values//
val data = Seq((num.toDouble, count.toDouble))
val row = spark.sparkContext.parallelize(data).toDF("Number","count")
val data2 = data1.union(row)
data1 = data2
}
但是我不建议这样做,最好将整个value
(必须是Seq
?)转换为数据框,然后再合并一次。许多工会往往效率低下。...
val newDF = value.toDF("Number")
.withColumn("count",$"Number" + 10)
val result= data1.union(newDF)
答案 2 :(得分:0)
只需使用for循环创建一个DataFrame,然后像这样与data1
合并:
val df = ( for(a <- values) yield (a, a+10) ).toDF("Number", "count")
val result = data1.union(df)
这比在for循环内进行联合要有效得多。