我需要使用Apache Spark将Websocket的流写入实木复合地板文件。当前的Apache Spark流功能似乎不支持开箱即用的WebSocket。
有一条命令可以从apache-spark中的TCP套接字读取流,因此我尝试将websocket转换为常规套接字,但还没有通过测试脚本获得火花来读取套接字:
我这样设置服务器:
import socket, socketserver, time
class MyHandler(socketserver.BaseRequestHandler):
def handle(self):
counter = 1
while 1:
#dataReceived = self.request.recv(1024)
#if not dataReceived: break
str_send = 'msg ' + str(counter)
self.request.send(str_send.encode("utf-8"))
counter+=1
time.sleep(2)
myServer = socketserver.TCPServer(('localhost',5146), MyHandler)
myServer.serve_forever( )
哪个可以与普通客户端配合使用?
import socket, socketserver, time
def client(ip, port):
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((ip, port))
while True:
response = str(sock.recv(1024))
print("Received: {}".format(response))
ip = 'localhost'
port = 5146
client(ip, port)
但是当我使用spark的示例读取TCP流时,仍然没有任何数据:
lines = spark \
.readStream \
.format("socket") \
.option("host", "localhost") \
.option("port", 5146) \
.load()
query = lines.writeStream\
.format("console")\
.outputMode('append')\
.start()\
.awaitTermination()
我也尝试写入文件,但是文件为空。
已建立连接,但没有数据通过:
$ netstat -na | grep "5146"
tcp4 0 0 127.0.0.1.5146 127.0.0.1.59823 ESTABLISHED
tcp4 0 0 127.0.0.1.59823 127.0.0.1.5146 ESTABLISHED
tcp4 0 0 127.0.0.1.5146 *.* LISTEN