Question

我正在尝试运行基本的kafka结构流式字数统计程序。这个程序在我的个人计算机上工作，但在我的客户端虚拟机中，它是令人费解的。我试图以本地模式运行该程序。

错误：

java.io.IOException：无法连接到/10.233.8.214:51300 在org.apache.spark.network.client.TransportClientFactory.createClient（TransportClientFactory.java:232）在org.apache.spark.network.client.TransportClientFactory.createClient（TransportClientFactory.java:182）在org.apache.spark.rpc.netty.NettyRpcEnv.downloadClient（NettyRpcEnv.scala：366）在org.apache.spark.rpc.netty.NettyRpcEnv.openChannel（NettyRpcEnv.scala：332）在org.apache.spark.util.Utils $ .doFetchFile（Utils.scala：654）在org.apache.spark.util.Utils $ .fetchFile（Utils.scala：480）在org.apache.spark.executor.Executor $$ anonfun $ org $ apache $ spark $ executor $ Executor $$ updateDependencies $ 5.apply（Executor.scala：696）

我正在运行的Spark代码：

spark = SparkSession\
        .builder\
        .appName("StructuredKafkaWordCount")\
        .getOrCreate()

    # Create DataSet representing the stream of input lines from kafka
    lines = spark\
        .readStream\
        .format("kafka")\
        .option("kafka.bootstrap.servers", bootstrapServers)\
        .option(subscribeType, topics)\
        .load()\
        .selectExpr("CAST(value AS STRING)")

    words = lines.select(
        # explode turns each item in an array into a separate row
        explode(
            split(lines.value, ' ')
        ).alias('word')
    )

    # Generate running word count
    wordCounts = words.groupBy('word').count()
    # Start running the query that prints the running counts to the console
    query = wordCounts\
        .writeStream\
        .outputMode('complete')\
        .format('console')\
        .start()

    query.awaitTermination()

Kafka Structure流式传输java.io.IOException：无法连接到/10.233.8.214:50723错误

0 个答案: