Question

在Spark中重新分区时如何自定义拆分。我正在寻找MapReduce的hasMoreKeyValue（），nextKeyValue（）的等效实现，Spark中的一种方法，以在跨转换的并行处理中自定义边界拆分，并自定义提供给转换的下一个值的迭代。

Eg Scenario:
Data is like ticket threads, each element in the thread has a chain of elements and each element has 3 parts.

1 # Problem OR Hand over Recommendation Annoted as @Problem or @HandOver 
2 # Action taken so far      "" @Action
3 # Hand Over Recommendation OR closing Note   "" @HandOver or @closing.

Tail of an Element, if not a closingNote would be head of the next element.
Similarly Head of an element if its not annotated as @Problem would be tail of previous element.

In another words, @HandOver could act as tail for one element and head for the next.

已经重新分配了已从文本文件加载到RDD中的数据。每个分区可以包含多个元素，但不应包含部分元素。

如何在Spark中自定义拆分和分区

0 个答案: