我在jupyter笔记本电脑上,想模拟一台服务器,以便在另一个笔记本电脑上运行的Spark Streaming应用程序上发送虚拟数据。
所以,我的服务器代码是:
# # -1) imports
import socket
import random
import time
# # 0) configuration
port = 12030
ip = socket.gethostname()
# # 1) création d'une socket
serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
serversocket.bind((ip, port))
serversocket.listen(1)
serversocket.setblocking(False)
# # 2) attendre que spark se connecte
(clientsocket, address) = serversocket.accept()
print("Connection de %s :\n %s"%(address, clientsocket))
# # 3) envoie de données
nb_client = 1000
nb_achat = 5
clients = ["client_%s"%x for x in range(nb_client)]
achats = [(random.choice(clients), random.randint(0, 100)) for x in range(nb_achat)]
tps_attente = 1
nb_achat = 50
for i in range(30):
print(i)
time.sleep(tps_attente)
achats = [(random.choice(clients), random.randint(0, 100)) for x in range(nb_achat)]
for client, valeur in achats:
to_send = '%s,%s\n'%(client, valeur)
clientsocket.send(to_send.encode())
我的SparkStreaming笔记本是:
import socket
from pyspark.sql import SparkSession
from pyspark.streaming import StreamingContext
listen_to_ip = socket.gethostname()
listen_to_port = 12030
spark = SparkSession.builder.getOrCreate()
sc = spark.sparkContext
nb_secondes = 4
ssc = StreamingContext(sc, nb_secondes)
dstream = ssc.socketTextStream(listen_to_ip, listen_to_port)
ssc.checkpoint("./checkpoint/")
def update_achats(nouvelles_valeurs, valeur_actuelle ):
if valeur_actuelle is None:
valeur_actuelle = 0
return sum(nouvelles_valeurs, valeur_actuelle)
data = dstream.map(lambda x: x.split(","))
clients_facture = data.map(lambda x: (x[0], float(x[1])*float(x[2])))
update_client = clients_facture.updateStateByKey(update_achats)
update_client.pprint()
ssc.start()
所以我首先启动服务器,然后启动sparkStreaming。
在服务器上,我首先看到:
Connection de ('172.17.0.2', 40258) :
<socket.socket fd=44, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('172.17.0.2', 12030), raddr=('172.17.0.2', 40258)>
然后在服务器上,我看到循环的进程,表明已发送数据:
0
1
2
3
4
5
6
7
8
9
10
...
在Spark Streaming笔记本上,什么都没有出现:-(
我记得一年前我已经遇到过这个问题,它一定是配置问题。有什么线索吗?