我编写了自定义接收器来接收由我们的某个应用程序生成的流。接收器启动进程获取流然后cals存储。但是,receive方法被多次调用,我已经写了正确的循环中断条件,但是,不能这样做。如何确保它只读取一次并且不读取已处理的数据。?
这是我的自定义接收器代码:
class MyReceiver() extends Receiver[String](StorageLevel.MEMORY_AND_DISK_2) with Logging {
def onStart() {
new Thread("Splunk Receiver") {
override def run() { receive() }
}.start()
}
def onStop() {
}
private def receive() {
try {
/* My Code to run a process and get the stream */
val reader = new ResultsReader(job.getResults()); // ResultReader is reader for the appication
var event:String = reader.getNextLine;
while (!isStopped || event != null) {
store(event);
event = reader.getNextLine;
}
reader.close()
} catch {
case t: Throwable =>
restart("Error receiving data", t)
}
}
}
我哪里出错。?
问题 1)每2秒后发生的作业和流读取以及相同的数据堆积。因此,对于60行数据,我总共得到1800或更多。
流媒体代码:
val conf = new SparkConf
conf.setAppName("str1");
conf.setMaster("local[2]")
conf.set("spark.driver.allowMultipleContexts", "true");
val ssc = new StreamingContext(conf, Minutes(2));
val customReceiverStream = ssc.receiverStream(new MyReceiver)
println(" searching ");
//if(customReceiverStream.count() > 0 ){
customReceiverStream.foreachRDD(x => {println("=====>"+ x.count());x.count()});
//}
ssc.start();
ssc.awaitTermination()
注意:我在本地群集中尝试此操作,并使用master作为本地[2]。