以下代码因AnalysisException失败:sc.version String = 1.6.0
case class Person(name: String, age: Long)
val caseClassDF = Seq(Person("Andy", 32)).toDF()
caseClassDF.count()
val seq = Seq(1)
val rdd = sqlContext.sparkContext.parallelize(seq)
val df2 = rdd.toDF("Counts")
df2.count()
val withCounts = caseClassDF.withColumn("duration", df2("Counts"))
答案 0 :(得分:1)
出于某种原因,它适用于UDF:
import org.apache.spark.sql.functions.udf
case class Person(name: String, age: Long, day: Int)
val caseClassDF = Seq(Person("Andy", 32, 1), Person("Raman", 22, 1), Person("Rajan", 40, 1), Person("Andy", 42, 2), Person("Raman", 42, 2), Person("Rajan", 50, 2)).toDF()
val calculateCounts= udf((x: Long, y: Int) =>
x+y)
val df1 = caseClassDF.withColumn("Counts", calculateCounts($"age", $"day"))
df1.show
+-----+---+---+------+
| name|age|day|Counts|
+-----+---+---+------+
| Andy| 32| 1| 33|
|Raman| 22| 1| 23|
|Rajan| 40| 1| 41|
| Andy| 42| 2| 44|
|Raman| 42| 2| 44|
|Rajan| 50| 2| 52|
+-----+---+---+------+
答案 1 :(得分:0)
caseClassDF.withColumn(&#34; duration&#34;, df2(&#34; Counts&#34;)),这里的列应该是相同的数据帧(在你的情况下< EM> caseClassDF )。 AFAIK,Spark不允许withColumn中的不同DataFrame的列。
PS:我是Spark 1.6.x的用户,不确定是否已经在Spark 2.x中出现了