我在传递CustomPartitioner
的键,值对的映射的位置定义了以下CustomPartitioner
,对于该ID,我希望返回该值。我已经计算出分区并将其放在地图中。值是分区号。
class CustomPartitioner (partitions: Int, accGrpMap: scala.collection.Map[Int, Int]) extends Partitioner {
override def numPartitions: Int = partitions
private val LOGGER = LoggerFactory.getLogger(classOf[CustomPartitioner])
override def getPartition(key: Any): Int =
{
val accGrpId:Int = key.asInstanceOf[String].toInt
accGrpMap(accGrpId)
}
override def equals(other: Any): Boolean = other match {
case h: CustomPartitioner =>
h.numPartitions == numPartitions
case _ =>
false
}
}
这就是我叫CustomPartitioner
的方式:
rdd.mapPartitionsWithIndex { (idx, iter) => if (idx == 0) iter.drop(1) else iter }
.map(line => if (colIndex == -1) (null, line) else (line.split(TILDE)(colIndex), line))
.partitionBy(new CustomPartitioner(partitionCount,partitionMap))
.map { case (_, line) => line }
.map(line => addEmptyColumns(line, schemaIndexArray))
.saveAsTextFile(s"$outputPath/$fileDir")
有人可以告诉我这里有什么问题吗?我该如何实现?