我在NetCat中具有以下流:
$ nc -l -p 5555
example1
example2
example3
example4
example5
以及PySpark中的以下代码:
from pyspark import SparkConf,SparkContext
from pyspark.streaming import StreamingContext
conf = SparkConf()
conf.setAppName("AccumulatorExample")
ssc = StreamingContext(sc, 5)
ssc.checkpoint("Checkpoint_Accumulator")
dataStream = ssc.socketTextStream("localhost",5555)
exampleAccumulator = sc.accumulator(0)
def update_accumulator(x):
global exampleAccumulator
exampleAccumulator += 1
return None
exampleStream = dataStream.map(lambda x: update_accumulator(x))
exampleStream.pprint()
ssc.start()
ssc.awaitTerminationOrTimeout(15)
print(exampleAccumulator.value)
-------------------------------------------
Time: 2018-12-11 10:15:55
-------------------------------------------
None
None
None
None
None
-------------------------------------------
Time: 2018-12-11 10:16:00
-------------------------------------------
-------------------------------------------
Time: 2018-12-11 10:16:05
-------------------------------------------
5
在上面的示例中,流终止后,将打印exampleAccumulator
的值。我想让流连续运行,并且希望在流运行时能够访问exampleAccumulator.value
。是否可以将exampleAccumulator.value
发送到套接字,以便其他应用程序可以读取它?