如何通过Spark Streaming流化Websocket?

时间:2019-05-13 05:32:40

标签: apache-spark websocket spark-streaming databricks

我需要使用Apache Spark将Websocket的流写入实木复合地板文件。当前的Apache Spark流功能似乎不支持开箱即用的WebSocket。

有一条命令可以从apache-spark中的TCP套接字读取流,因此我尝试将websocket转换为常规套接字,但还没有通过测试脚本获得火花来读取套接字:

我这样设置服务器:

import socket, socketserver, time

class MyHandler(socketserver.BaseRequestHandler):
    def handle(self):
        counter = 1
        while 1:
            #dataReceived = self.request.recv(1024)
            #if not dataReceived: break
            str_send = 'msg ' + str(counter)
            self.request.send(str_send.encode("utf-8"))
            counter+=1
            time.sleep(2)

myServer = socketserver.TCPServer(('localhost',5146), MyHandler)
myServer.serve_forever(  )

哪个可以与普通客户端配合使用?

import socket, socketserver, time
def client(ip, port):
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.connect((ip, port))
    while True:
        response = str(sock.recv(1024))
        print("Received: {}".format(response))

ip = 'localhost'
port = 5146
client(ip, port)

但是当我使用spark的示例读取TCP流时,仍然没有任何数据:

lines = spark \
    .readStream \
    .format("socket") \
    .option("host", "localhost") \
    .option("port", 5146) \
    .load()

query = lines.writeStream\
      .format("console")\
      .outputMode('append')\
      .start()\
      .awaitTermination()

我也尝试写入文件,但是文件为空。

已建立连接,但没有数据通过:

$ netstat -na | grep "5146"
tcp4       0      0  127.0.0.1.5146         127.0.0.1.59823        ESTABLISHED
tcp4       0      0  127.0.0.1.59823        127.0.0.1.5146         ESTABLISHED
tcp4       0      0  127.0.0.1.5146         *.*                    LISTEN

0 个答案:

没有答案