类型HashPartitioner不是org.apache.spark.sql.SparkSession

时间:2017-05-24 19:16:10

标签: apache-spark partitioner

我使用spark-shell来试验Spark的HashPartitioner。错误如下所示:

scala> val data = sc.parallelize(List((1, 3), (2, 4), (3, 6), (3, 7)))
data: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[0] at parallelize at <console>:24

scala> val partitionedData = data.partitionBy(new spark.HashPartitioner(2))
<console>:26: error: type HashPartitioner is not a member of org.apache.spark.sql.SparkSession
       val partitionedData = data.partitionBy(new spark.HashPartitioner(2))
                                                        ^

scala> val partitionedData = data.partitionBy(new org.apache.spark.HashPartitioner(2))
partitionedData: org.apache.spark.rdd.RDD[(Int, Int)] = ShuffledRDD[1] at partitionBy at <console>:26

第三次操作有效时,第二次操作失败。为什么spark-shell会在org.apache.spark.sql.SparkSession的包中找到spark.HashPartitioner而不是org.apache.spark?

1 个答案:

答案 0 :(得分:6)

sparkSparkSession个对象而不是org.apache.spark个包。

您应导入org.apache.spark.HashPartitioner或使用完整的班级名称,例如:

import org.apache.spark.HashPartitioner

val partitionedData = data.partitionBy(new HashPartitioner(2))