我试图在Spark Sql中构建一些技能,主要是在DataFrames中。所以我想要做的就是
我使用的代码是
import org.apache.spark.sql.functions.udf
val profDF = Seq((1,"James","Detective"),
(2,"Harvey","Captain"),
(3,"Barbara","Club Owner")).toDF("ID", "Name", "Occ")
val persDF = Seq((1, 30, "Single"),
(2, 35, "Married"),
(3, 30, "Single")).toDF("ID", "Age", "Status")
val upperAdd2:(String, Int)=>(String, Int) = (name, age) => (name.toUpperCase, age + 2)
val upAddUDF = udf(upperAdd2)
profDF.join(persDF, profDF("ID")===persDF("ID"))
.withColumn("Result", upAdUDF(profDF("Name"),persDF("Age")))
.withColumn("Caps Name", col("Result._1"))
.withColumn("More Age", col("Result._2"))
.drop("Result")
.show
执行最后一个语句后出现以下错误
org.apache.spark.sql.AnalysisException: Reference 'ID' is ambiguous, could be: ID#4, ID#12.;
我认为我已经为" ID"正确指定了数据框架。因为以下声明可以正常使用
profDF.join(persDF, profDF("ID")===persDF("ID"))
.withColumn("Result", upAdUDF(profDF("Name"),persDF("Age")))
.show
我还需要在此处指明其他内容吗?
我使用Spark 1.6.0,代码仅用于学习目的。如果可以进一步改进,请告诉我。