我创建了两个不同的python文件。 1-用于从csv文件读取数据并通过netcat服务器发送。 2-从网猫服务器读取数据。
我能够将数据发送到Net Cat服务器。但是,第二个文件无法从netcat服务器读取using spark socketStream。有趣的是,如果我通过netcat终端手动输入数据,程序将能够读取它。
#Client code:
import socket
import time
HOST = 'localhost'
PORT = 8888
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((HOST, PORT))
i = 0
with open("Only-R80711-SC.csv", "r") as fo:
for line in fo:
if i <= 100:
print(line)
s.send(line.encode('utf-8'))
i = i + 1
else:
i = 0
time.sleep(0)
print("Done sending")
s.close()
#server code
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
sc = SparkContext("local[2]", "NetworkWordCount")
ssc = StreamingContext(sc, 10)
lines = ssc.socketTextStream("localhost", 8888)
lines.pprint()
ssc.start() # Start the computation
ssc.awaitTermination() # Wait for the computation to terminate