如何将累加器的结果发送到套接字?

时间:2018-12-11 15:23:05

标签: apache-spark pyspark spark-streaming

输入:

我在NetCat中具有以下流:

$ nc -l -p 5555
example1
example2
example3
example4
example5

以及PySpark中的以下代码:

from pyspark import SparkConf,SparkContext
from pyspark.streaming import StreamingContext
conf = SparkConf()
conf.setAppName("AccumulatorExample")

ssc = StreamingContext(sc, 5)
ssc.checkpoint("Checkpoint_Accumulator")
dataStream = ssc.socketTextStream("localhost",5555)

exampleAccumulator = sc.accumulator(0)

def update_accumulator(x):
    global exampleAccumulator
    exampleAccumulator += 1
    return None

exampleStream = dataStream.map(lambda x: update_accumulator(x))
exampleStream.pprint()
ssc.start()
ssc.awaitTerminationOrTimeout(15)

print(exampleAccumulator.value)

输出

-------------------------------------------
Time: 2018-12-11 10:15:55
-------------------------------------------
None
None
None
None
None

-------------------------------------------
Time: 2018-12-11 10:16:00
-------------------------------------------

-------------------------------------------
Time: 2018-12-11 10:16:05
-------------------------------------------

5

目标:

在上面的示例中,流终止后,将打印exampleAccumulator的值。我想让流连续运行,并且希望在流运行时能够访问exampleAccumulator.value。是否可以将exampleAccumulator.value发送到套接字,以便其他应用程序可以读取它?

0 个答案:

没有答案