我正在尝试运行基本的kafka结构流式字数统计程序。这个程序在我的个人计算机上工作,但在我的客户端虚拟机中,它是令人费解的。我试图以本地模式运行该程序。
错误:
java.io.IOException:无法连接到/10.233.8.214:51300 在org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:232) 在org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:182) 在org.apache.spark.rpc.netty.NettyRpcEnv.downloadClient(NettyRpcEnv.scala:366) 在org.apache.spark.rpc.netty.NettyRpcEnv.openChannel(NettyRpcEnv.scala:332) 在org.apache.spark.util.Utils $ .doFetchFile(Utils.scala:654) 在org.apache.spark.util.Utils $ .fetchFile(Utils.scala:480) 在org.apache.spark.executor.Executor $$ anonfun $ org $ apache $ spark $ executor $ Executor $$ updateDependencies $ 5.apply(Executor.scala:696)
我正在运行的Spark代码:
spark = SparkSession\
.builder\
.appName("StructuredKafkaWordCount")\
.getOrCreate()
# Create DataSet representing the stream of input lines from kafka
lines = spark\
.readStream\
.format("kafka")\
.option("kafka.bootstrap.servers", bootstrapServers)\
.option(subscribeType, topics)\
.load()\
.selectExpr("CAST(value AS STRING)")
words = lines.select(
# explode turns each item in an array into a separate row
explode(
split(lines.value, ' ')
).alias('word')
)
# Generate running word count
wordCounts = words.groupBy('word').count()
# Start running the query that prints the running counts to the console
query = wordCounts\
.writeStream\
.outputMode('complete')\
.format('console')\
.start()
query.awaitTermination()