如何使用CustomReceiver只读取一次InputStream

时间:2016-01-12 15:18:15

标签: apache-spark spark-streaming

我编写了自定义接收器来接收由我们的某个应用程序生成的流。接收器启动进程获取流然后cals存储。但是,receive方法被多次调用,我已经写了正确的循环中断条件,但是,不能这样做。如何确保它只读取一次并且不读取已处理的数据。?

这是我的自定义接收器代码:

class MyReceiver() extends Receiver[String](StorageLevel.MEMORY_AND_DISK_2) with Logging {

  def onStart() {
    new Thread("Splunk Receiver") {
       override def run() { receive() }
    }.start()
  }

  def onStop() {

  }

  private def receive() {


    try {

      /*  My Code to run a process and get the stream */

      val reader = new ResultsReader(job.getResults()); // ResultReader is reader for the appication
      var event:String = reader.getNextLine;

      while (!isStopped || event != null) {
        store(event);
        event = reader.getNextLine;
      }
      reader.close()

    } catch {
      case t: Throwable =>
        restart("Error receiving data", t)
    }
  }
}

我哪里出错。?

问题 1)每2秒后发生的作业和流读取以及相同的数据堆积。因此,对于60行数据,我总共得到1800或更多。

流媒体代码:

val conf = new SparkConf
    conf.setAppName("str1");
    conf.setMaster("local[2]")
    conf.set("spark.driver.allowMultipleContexts", "true");

    val ssc = new StreamingContext(conf, Minutes(2));

    val customReceiverStream = ssc.receiverStream(new MyReceiver)

    println(" searching ");
    //if(customReceiverStream.count() > 0 ){

     customReceiverStream.foreachRDD(x => {println("=====>"+ x.count());x.count()});
    //}
    ssc.start();
    ssc.awaitTermination() 

注意:我在本地群集中尝试此操作,并使用master作为本地[2]。

0 个答案:

没有答案