Spark不断将偏移量重置为X

时间:2019-12-11 08:35:22

标签: apache-spark apache-kafka spark-structured-streaming

我正在使用kafka开发Spark结构化流应用程序。除了一件事,它工作正常。问题是火花不断将所有分区的偏移量重置为X。这消耗了大量的网络IO和CPU。如果我添加更多的kafka消费者,则CPU消耗会很明显。空闲时将近25%的CPU使用率。 这是正常行为吗?还是我缺少一些配置?

我创建了最小的Spark Kafka消费者应用程序进行演示。

这是全新启动并处于空闲状态的日志。

19/12/11 16:24:29 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:29 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:29 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:29 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:29 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:29 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:29 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:29 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:29 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:29 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:29 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:29 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:29 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:29 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:29 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:29 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:29 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:29 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:29 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:29 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:29 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:29 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:29 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:29 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:29 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:29 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:29 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:29 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:29 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:29 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:29 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:29 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:30 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:30 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:30 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:30 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:30 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:30 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:30 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:30 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:30 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:30 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.
19/12/11 16:24:30 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-b2457938-d427-47e2-b90a-7c6f0d85904b--1563005380-driver-0] Resetting offset for partition testJson-0 to offset 2.

项目

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.streaming.OutputMode;
import org.apache.spark.sql.streaming.StreamingQuery;
import org.apache.spark.sql.streaming.StreamingQueryException;
import org.apache.spark.sql.types.DataTypes;
import org.apache.spark.sql.types.StructField;
import org.apache.spark.sql.types.StructType;

import static org.apache.spark.sql.functions.*;

public class Main {

    public Main() throws StreamingQueryException {
        StructType schema = DataTypes.createStructType(new StructField[]{
                DataTypes.createStructField("id", DataTypes.LongType, false),
                DataTypes.createStructField("name", DataTypes.StringType, false),
                DataTypes.createStructField("value", DataTypes.LongType, false),
        });

        SparkConf conf =  new SparkConf(true)
                .setMaster("local[1]")
                .set("spark.default.parallelism", "1")
                .setAppName("spark-kafka-demo1");

        JavaSparkContext context = new JavaSparkContext(conf);

        SparkSession session = SparkSession
                .builder()
                .config(conf)
                .sparkContext(context.sc())
                .appName("spark-kafka-demo1")
                .getOrCreate();

        Dataset<Row> dataset = session
                .readStream()
                .format("kafka")
                .option("kafka.bootstrap.servers", "192.168.0.201:9092")
                .option("subscribe", "testJson")
                .load()
                .selectExpr("CAST(value AS STRING) as message")
                .select(from_json(col("message"), schema).as("t"));

        StreamingQuery query = dataset
            .groupBy("t.name")
                .agg(sum("t.value"))
            .writeStream()
                .format("console")
                .option("truncate", false)
                .outputMode(OutputMode.Update())
                .start();

        query.awaitTermination();
    }

    public static void main(String[] args) throws StreamingQueryException {
        new Main();
    }
}

Maven依赖项

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>2.4.4</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_2.11</artifactId>
            <version>2.4.4</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql-kafka-0-10_2.11</artifactId>
            <version>2.4.4</version>
        </dependency>

0 个答案:

没有答案