我正在尝试按照以下示例对hbase行进行分区:https://www.opencore.com/blog/2016/10/efficient-bulk-load-of-hbase-using-spark/
但是,我已经存储了数据(字符串,字符串,字符串),其中第一个是行键,第二个是列名,第三个是列值。
我尝试编写隐式排序以实现OrderedRDD隐式
implicit val caseInsensitiveOrdering: Ordering[(String, String, String)] = new Ordering[(String, String, String)] {
override def compare(x: (String, String, String), y: (String, String, String)): Int = ???
}
,但repartitionAndSortWithinPartitions仍然不可用。我可以在元组中使用此方法吗?
答案 0 :(得分:1)
RDD必须具有键和值,而不仅限于值,例如:
val data = List((("5", "6", "1"), (1)))
val rdd : RDD[((String, String, String), Int)] = sparkContext.parallelize(data)
implicit val caseInsensitiveOrdering = new Ordering[(String, String, String)] {
override def compare(x: (String, String, String), y: (String, String, String)): Int = 1
}
rdd.repartitionAndSortWithinPartitions(..)