我有格式的输入数据
RDD[
(Map1, RecordA),
(Map2, RecordX),
(Map1, RecordB),
(Map2, RecordY),
(Map1, RecordC),
(Map2, RecordZ)
]
预期格式为(RDD列表):
List[
RDD[RecordA, RecordB, RecordC],
RDD[RecordX, RecordY, RecordZ]
]
我希望将内部RDD按Map1,Map2的键进行分组,并且我想创建一个外部列表作为内部RDD的集合。
我尝试使用reduceByKey API和gregationByKey API,但到目前为止尚未成功!
真实世界的示例:
RDD[
(Map("a"->"xyz", "b"->"per"), CustomRecord("test1", 1, "abc")),
(Map("a"->"xyz", "b"->"per"), CustomRecord("test2", 1, "xyz")),
(Map("a"->"xyz", "b"->"lmm"), CustomRecord("test3", 1, "blah")),
(Map("a"->"xyz", "b"->"lmm"), CustomRecord("test4", 1, "blah"))
]
final case class CustomRecord(
string1: String,
int1: Int,
string2: String)
感谢您的帮助。