Question

我在计算机的端口9999上创建了一个流。

我必须在其上实现DGIM算法。但是，我无法一一读取数据流中的位。下面是我的代码：

from pyspark import SparkContext
from pyspark.streaming import StreamingContext
import math
sc = SparkContext("local[2]", "NetworkWordCount")
ssc = StreamingContext(sc, 1)

使用以下命令时，我可以批量打印流：

lines.pprint()
ssc.start()             # Start the computation
ssc.awaitTermination()

但是当我尝试打印每个位时，都会出现错误：

for l in lines.iter_lines():
   print l
ssc.start()             # Start the computation
ssc.awaitTermination()

有人可以告诉我如何读取流中的每一位，以便实施算法吗？

Answer 1

我使用了以下代码：

 streams.foreachRDD(lambda c: function(c))
 function(c):
    c.collect()

这会从每个流中提取出rdd，然后该函数会收集所有流

如何在Python中读取/迭代数据流

1 个答案: