在Spark中重新分区时如何自定义拆分。我正在寻找MapReduce的hasMoreKeyValue(),nextKeyValue()的等效实现,Spark中的一种方法,以在跨转换的并行处理中自定义边界拆分,并自定义提供给转换的下一个值的迭代。
Eg Scenario:
Data is like ticket threads, each element in the thread has a chain of elements and each element has 3 parts.
1 # Problem OR Hand over Recommendation Annoted as @Problem or @HandOver
2 # Action taken so far "" @Action
3 # Hand Over Recommendation OR closing Note "" @HandOver or @closing.
Tail of an Element, if not a closingNote would be head of the next element. Similarly Head of an element if its not annotated as @Problem would be tail of previous element. In another words, @HandOver could act as tail for one element and head for the next.
已经重新分配了已从文本文件加载到RDD中的数据。每个分区可以包含多个元素,但不应包含部分元素。