是否可以创建具有新名称的Dstream并在运行时销毁旧的dstream?
//Read the Dstream
inputDstream = ssc.textFileStream("./myPath/")
实施例: 我正在读一个名为 cvd_filter.txt 的文件,其中每一行都包含一个字符串,该字符串应该是dstream的过滤条件。使用新值更新此文件(也可以附加):
示例: 在时间10: 00 ; cat cvd_filter.txt
"1001"
"1002"
"1003"
// Read cvd_filter.txt every 5 mins and do creation/destruction of dstreams.
with open(cvd_filter.txt) as f:
content = f.readlines()
dstream_content[0] = inputDstream.filter(lambda a: content[0] in a)
// At this point (dstream_1001 , dstream_1002, dstream_1003) should get created.
// NOW, DO SOME OPERATION ON INDIVIDUAL dstreams.
时间10: 05 ; cat cvd_filter.txt
"1004"
"1002"
"1003"
// Create dstream_1004 for new filter string, Destroy dstream_1001 only // but retain dstream_1002 and dstream_1003. At this point (dstream_1004 , dstream_1002, dstream_1003) should be present. // NOW, DO SOME OPERATION ON INDIVIDUAL dstreams.
答案 0 :(得分:0)
NO。 DStream上的新流或操作不能添加到正在运行的上下文中。
我建议根据foreachRDD
对您的用例进行建模,这样您就可以自由地对底层RDD进行任意操作。
例如:
val dstream = ??? /// original dstream
dstream.foreachRDD{rdd =>
val filters = // read file
val filteredRDDs = filters.map(f => rdd.filter(elem => elem.contains(f))
...
}
然后在不同的过滤RDD上进一步表达您需要的操作。 DStreams
将所有转换操作委托给底层RDD,因此您应该能够以这种方式表达您的业务逻辑。