在ECLIPSE中执行时无法查看kafka使用者输出:PySpark

时间:2018-10-06 22:16:10

标签: apache-spark pyspark apache-kafka spark-streaming

我在Windows系统中安装了kafka和zookeeper。我已经启动了kafka和zookeeper服务器,创建了主题“ javainuse-topic”,并使用以下命令启动了生产者和消费者

  

。\ bin \ windows \ zookeeper-server-start.bat。\ config \ zookeeper.properties

     

。\ bin \ windows \ kafka-server-start.bat。\ config \ server.properties

     

。\ bin \ windows \ kafka-topics.bat --create --zookeeper本地主机:2181   --replication-factor 1-分区1 --topic javainuse-topic

     

。\ bin \ windows \ kafka-console-producer.bat --broker-list本地主机:9092   --topic javainuse-topic

     

。\ bin \ windows \ kafka-console-consumer.bat-引导服务器   本地主机:9092 --topic javainuse-topic --from-beginning

我能够成功地将数据从生产者传输到消费者。因此,我在eclipse中编写了以下代码,并尝试在本地执行它。但是我无法在Eclipse控制台中查看使用者数据。

import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.3.0 pyspark-shell'

import sys
import time
from pyspark import SparkContext, SparkConf
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils


n_secs = 1
topic = "javainuse-topic"

conf = SparkConf().setAppName("KafkaStreamProcessor").setMaster("local[*]")
sc = SparkContext(conf=conf)
sc.setLogLevel("WARN")
ssc = StreamingContext(sc, n_secs)

kafkaStream = KafkaUtils.createDirectStream(ssc, [topic], {
                        'bootstrap.servers':'localhost:9092', 
                        'group.id':'javainuse-topic', 
                        'fetch.message.max.bytes':'15728640',
                        'auto.offset.reset':'largest'})
                        # Group ID is completely arbitrary

lines = kafkaStream.map(lambda x: x[1])
counts = lines.flatMap(lambda line: line.split(" ")).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a+b)
counts.pprint()

ssc.start()
time.sleep(6) # Run stream for 10 minutes just in case no detection of producer
# ssc.awaitTermination()
ssc.stop(stopSparkContext=True,stopGraceFully=True)

1 个答案:

答案 0 :(得分:1)

您可以重试,但是这次将auto.offset.reset设置为'earliest'(如果使用的是旧消费者,则设置为'smallest')。

kafkaStream = KafkaUtils.createDirectStream(ssc, [topic], {
                        'bootstrap.servers':'localhost:9092', 
                        'group.id':'javainuse-topic', 
                        'fetch.message.max.bytes':'15728640',
                        'auto.offset.reset':'earliest'})
                        # Group ID is completely arbitrary