Kafka Structure流式传输java.io.IOException:无法连接到/10.233.8.214:50723错误

时间:2017-09-14 15:17:16

标签: apache-spark pyspark

我正在尝试运行基本的kafka结构流式字数统计程序。这个程序在我的个人计算机上工作,但在我的客户端虚拟机中,它是令人费解的。我试图以本地模式运行该程序。

错误:

  

java.io.IOException:无法连接到/10.233.8.214:51300           在org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:232)           在org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:182)           在org.apache.spark.rpc.netty.NettyRpcEnv.downloadClient(NettyRpcEnv.scala:366)           在org.apache.spark.rpc.netty.NettyRpcEnv.openChannel(NettyRpcEnv.scala:332)           在org.apache.spark.util.Utils $ .doFetchFile(Utils.scala:654)           在org.apache.spark.util.Utils $ .fetchFile(Utils.scala:480)           在org.apache.spark.executor.Executor $$ anonfun $ org $ apache $ spark $ executor $ Executor $$ updateDependencies $ 5.apply(Executor.scala:696)

我正在运行的Spark代码:

spark = SparkSession\
        .builder\
        .appName("StructuredKafkaWordCount")\
        .getOrCreate()

    # Create DataSet representing the stream of input lines from kafka
    lines = spark\
        .readStream\
        .format("kafka")\
        .option("kafka.bootstrap.servers", bootstrapServers)\
        .option(subscribeType, topics)\
        .load()\
        .selectExpr("CAST(value AS STRING)")

    words = lines.select(
        # explode turns each item in an array into a separate row
        explode(
            split(lines.value, ' ')
        ).alias('word')
    )

    # Generate running word count
    wordCounts = words.groupBy('word').count()
    # Start running the query that prints the running counts to the console
    query = wordCounts\
        .writeStream\
        .outputMode('complete')\
        .format('console')\
        .start()

    query.awaitTermination()

0 个答案:

没有答案