数据框加入错误

时间:2017-02-01 17:12:43

标签: scala apache-spark spark-dataframe

我创建了3个数据框,现在我想加入它们。但是,我遇到了这个错误: NoSuchMethodError: org.apache.spark.rdd.RDD.mapPartitionsInternal$default$2()Z

以下是代码:

val join1 = c1_df.join(ck_df, ck_df.col("APP_ID") === c1_df.col("ID"))

val joinFinal = join1.join(c2_df, c2_df.col("APP_ID") === join1.col("APP_ID"))

joinFinal.show()

1 个答案:

答案 0 :(得分:1)

以下代码段没有任何问题。您确定c1_dfck_dfc2_df是有效的数据框吗?这也可能是火花版本设置问题。确保使用正确版本的Spark,并相应地设置SPARK_HOME变量。

val c1_df = sc.parallelize((1 to 10)).toDF("ID")
val ck_df = sc.parallelize((1 to 10)).toDF("APP_ID")
val c2_df = sc.parallelize((1 to 10)).toDF("APP_ID")
val join1 = c1_df.join(ck_df, ck_df.col("APP_ID") === c1_df.col("ID"))
val joinFinal = join1.join(c2_df, c2_df.col("APP_ID") === join1.col("APP_ID"))
joinFinal.show()
+---+------+------+
| ID|APP_ID|APP_ID|
+---+------+------+
|  1|     1|     1|
|  6|     6|     6|
|  3|     3|     3|
|  5|     5|     5|
|  9|     9|     9|
|  4|     4|     4|
|  8|     8|     8|
|  7|     7|     7|
| 10|    10|    10|
|  2|     2|     2|
+---+------+------+