无法将流发送到火花

时间:2018-05-19 13:31:54

标签: python string apache-spark pyspark encode

在服务器端,我设置了一个简单的tcp服务器,如下所示:

import socket
from time import sleep
host = 'localhost'
port = 9999
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind((host, port))
s.listen(1)
print('\nListening for a client at',host , port)
conn, addr = s.accept()
print('\nConnected by', addr)
try:
    while True:
          with open('FireData-Part2.csv') as f:
                    header = next(f) # read from second line in csv file 
                    for line in f: 
                        out = "linesdaadsa".encode('utf-8') #  !!!IT DOES NOT WORK HERE  !!
                        #out = line.encode('utf-8') #  !!IT WORKS HERE !!!!!!!
                        self.conn.send(out)
                        print('Sending line',line)
                        sleep(0.2)  # ensure every 1 second 5 of firedata is generated 
                    print('End Of Stream fire.')
except socket.error:
    print ('Error Occured.\n\nClient disconnected.\n')

在客户端,我写了一些简单的东西:

import sys
import pymongo
from pymongo import MongoClient
from pprint import pprint
from pyspark import SparkContext
from pyspark.streaming import StreamingContext   


sc = SparkContext.getOrCreate()

if (sc is None):
    sc = SparkContext(appName="MongoDBApp")
ssc = StreamingContext(sc, 5)

host = "localhost"
port = 12345

lines = ssc.socketTextStream(host, int(port))

lines.pprint()

ssc.start()

try:
    #deliberately cancel the execution after one minute
    ssc.awaitTermination(timeout=60)
except KeyboardInterrupt:
    ssc.stop()
    sc.stop()

ssc.stop()
sc.stop()
然而,没有任何东西可以打印,我明白我在做的是尝试发送简单的字符串,如' line'给客户。如果我直接用线编码,它会打印出来!这对我来说太奇怪了,因为它们都是相同类型的字符串,为什么不能简单地编码一个字符串并将其发送给客户端?

1 个答案:

答案 0 :(得分:0)

这个问题与我提出的最后一个问题非常相似,解决方案是一样的,我在发送之前向字符串数据添加了“\ n”。基本上,它看起来像这样:

out = "linesdaadsa\n".encode('utf-8')

或      out = (lines+"\n").encode('utf-8)
我不知道为什么会这样,我想这是火花的一些问题。但是对于像我这样的新人来说是如此烦人,有人能不知道为什么会这样吗?谢谢!