应用错误收集

有没有人知道在特定批处理间隔内从DStream中的RDD序列创建大型RDD的方法：

例如，在下面的代码中：

def createLargeRDD() {

    val sparkConf = new SparkConf().setAppName("Test").setMaster("local[2]")
    val sc = new SparkContext(sparkConf)
    val ssc = new StreamingContext(sc, Seconds(3))

    val DStream = KafkaUtilHelper.RetrieveDStream(ssc)

    DStream.transform { rdd =>
      /* Form an RDD with all of the RDD's that were put into the 
         DStream variable above for the 3 seconds batch interval */     
      rdd
    }
}

因此每隔3秒就会将RDD添加到该DStream变量中。有没有办法可以将DStream中所有那些在3秒内完成的RDD聚合到一个大型RDD中，并将该RDD保存到HBase或某些外部源。

在DStream中从RDD创建一个大型RDD

0 个答案: