Question

我正在使用here提到的方式在Spark-Streaming中进行二次排序。但它给出了以下错误：

repartitionAndSortWithinPartitions is not a member of org.apache.spark.streaming.dstream.DStream

代码：

def ProcessDStream(lines : DStream[EventData]) {            
            val dataSetrawSorted = lines.repartitionAndSortWithinPartitions(new DataSetPartitioner(1000))
            }

那么，如何在Dstream中实现相同目标。

Answer 1

使用transform：

stream.transform { rdd => rdd.repartitionAndSortWithinPartitions(...) }

如何在SparkSteaming中进行二次排序

1 个答案: