将Column添加到DataFrame的问题

时间:2016-09-08 06:29:40

标签: apache-spark spark-dataframe

以下代码因AnalysisException失败:sc.version String = 1.6.0

case class Person(name: String, age: Long)
val caseClassDF = Seq(Person("Andy", 32)).toDF()
caseClassDF.count()

val seq = Seq(1)
val rdd = sqlContext.sparkContext.parallelize(seq)
val df2 = rdd.toDF("Counts")
df2.count()

val withCounts = caseClassDF.withColumn("duration", df2("Counts"))

2 个答案:

答案 0 :(得分:1)

出于某种原因,它适用于UDF:

import org.apache.spark.sql.functions.udf
case class Person(name: String, age: Long, day: Int)
val caseClassDF = Seq(Person("Andy", 32, 1), Person("Raman", 22, 1), Person("Rajan", 40, 1), Person("Andy", 42, 2), Person("Raman", 42, 2), Person("Rajan", 50, 2)).toDF()

val calculateCounts= udf((x: Long, y: Int) => 
  x+y)

val df1 = caseClassDF.withColumn("Counts", calculateCounts($"age", $"day"))
df1.show

+-----+---+---+------+
| name|age|day|Counts|
+-----+---+---+------+
| Andy| 32|  1|    33|
|Raman| 22|  1|    23|
|Rajan| 40|  1|    41|
| Andy| 42|  2|    44|
|Raman| 42|  2|    44|
|Rajan| 50|  2|    52|
+-----+---+---+------+

答案 1 :(得分:0)

caseClassDF.withColumn(&#34; duration&#34;, df2(&#34; Counts&#34;)),这里的列应该是相同的数据帧(在你的情况下< EM> caseClassDF )。 AFAIK,Spark不允许withColumn中的不同DataFrame的列。

PS:我是Spark 1.6.x的用户,不确定是否已经在Spark 2.x中出现了