我使用以下python生成器将一些msg发布到我的kafka主题(我也可以使用jupyter中的python使用者完美地接收我发布的数据)。
from kafka import KafkaProducer
import json,time
userdata={
"ipaddress": "172.16.0.57",
"logtype": "",
"mid": "",
"name":"TJ"
}
producer = KafkaProducer(bootstrap_servers=['localhost:9092'],value_serializer=lambda v: json.dumps(v).encode('utf-8'))
for i in range(10):
print("adding",i)
producer.send('test', userdata)
time.sleep(3)
但是当我尝试在spark中运行kafkastreaming示例时,我没有得到任何东西(我应该注意到我的工作站中的spark是可操作的,因为我可以运行网络流示例而没有任何问题):
from __future__ import print_function
from pyspark.streaming.kafka import KafkaUtils
import sys
import os
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
import json
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-streaming-kafka-0-8_2.10:2.0.2 pyspark-shell'
sc = SparkContext("local[2]", "KafkaSTREAMWordCount")
ssc = StreamingContext(sc, 2)
kafka_stream = KafkaUtils.createStream(ssc,"localhost:2181","raw-event-streaming-consumer",{"test":1})
parsed = kafka_stream.map(lambda (k, v): json.loads(v))
parsed.pprint()
ssc.start()
ssc.awaitTermination()
以下是输出样本:
-------------------------------------------
Time: 2017-08-28 14:08:32
-------------------------------------------
-------------------------------------------
Time: 2017-08-28 14:08:33
-------------------------------------------
-------------------------------------------
Time: 2017-08-28 14:08:34
-------------------------------------------
注意:我的系统规格如下:
Ubuntu 16.04 Spark:spark-2.2.0-bin-hadoop2.7 Jupyter笔记本(python 2.7) 卡夫卡:kafka_2.11-0.11.0.0
我的.bashrc中有以下几行:
export PATH="/home/myubuntu/anaconda3/bin:$PATH"
export PATH="/home/myubuntu/Desktop/spark-2.2.0-bin-hadoop2.7/bin:$PATH"
export PATH="/home/myubuntu/Desktop/spark-2.2.0-bin-hadoop2.7/jars:$PATH"
export PATH="/home/myubuntu/Desktop/spark-2.2.0-bin-hadoop2.7/python:$PATH"
export PATH="/home/myubuntu/Desktop/spark-2.2.0-bin-hadoop2.7/python/pyspark:$PATH"
export PATH="/home/myubuntu/Desktop/spark-2.2.0-bin-hadoop2.7/python/pyspark/streaming:$PATH"
function snotebook ()
{
#Spark path (based on your computer)
SPARK_PATH=~/spark-2.0.0-bin-hadoop2.7
export PYSPARK_DRIVER_PYTHON="jupyter"
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
# For python 3 users, you have to add the line below or you will get an error
#export PYSPARK_PYTHON=python3
#$SPARK_PATH/bin/pyspark --master local[2]
/home/myubuntu/Desktop/spark-2.2.0-bin-hadoop2.7/bin/pyspark --master local[2]
}
答案 0 :(得分:0)
我发现了错误。使用spark spark-2.2.0-bin-hadoop2.7,我们需要使用以下jar:
--packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.2.0