如何为Elasticsearch产生Confluent的Kafka虚拟数据生成器(datagen)消息?

时间:2018-11-01 07:01:10

标签: elasticsearch apache-kafka apache-kafka-connect confluent

我正在尝试向Elasticsearch生成Confluent的Kafka虚拟数据生成器消息:

数据源(从here提取的avro文件)的初始化:

./bin/ksql-datagen schema=~/impressions.avro bootstrap-server=host009:9092 format=json key=impressionid topic=impressions2 maxInterval=1000

连接器的初始化:

./bin/connect-standalone ./etc/schema-registry/connect-avro-standalone.properties ./etc/kafka-connect-elasticsearch/quickstart-elasticsearch.properties

连接器初始化错误

[2018-11-01 09:32:41,155] ERROR WorkerSinkTask{id=elasticsearch-sink-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:177)
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:178)
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:510)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:490)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:321)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:225)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:193)
        at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
        at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.kafka.connect.errors.DataException: impressions2
        at io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:97)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$0(WorkerSinkTask.java:510)
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
        ... 13 more
Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id -1
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
[2018-11-01 09:32:41,157] ERROR WorkerSinkTask{id=elasticsearch-sink-0} Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:178)
[2018-11-01 09:32:41,157] INFO Stopping ElasticsearchSinkTask. (io.confluent.connect.elasticsearch.ElasticsearchSinkTask:179)

但是,如果我使用控制台生产者(./bin/kafka-avro-console-producer)而不是datagen手动生成消息,则可以正常工作,并且可以在Elasticsearch中看到结果。

问题:如何为Elasticsearch产生数据生成消息?

更新

使用format=avro初始化 datagen

[2018-11-02 16:51:11,087] INFO StreamsConfig values: 
        application.id = 
        application.server = 
        bootstrap.servers = []
        buffered.records.per.partition = 1000
        cache.max.bytes.buffering = 10485760
        client.id = 
        commit.interval.ms = 30000
        connections.max.idle.ms = 540000
        default.deserialization.exception.handler = class org.apache.kafka.streams.errors.LogAndFailExceptionHandler
        default.key.serde = class org.apache.kafka.common.serialization.Serdes$ByteArraySerde
        default.production.exception.handler = class org.apache.kafka.streams.errors.DefaultProductionExceptionHandler
        default.timestamp.extractor = class org.apache.kafka.streams.processor.FailOnInvalidTimestamp
        default.value.serde = class org.apache.kafka.common.serialization.Serdes$ByteArraySerde
        metadata.max.age.ms = 300000
        metric.reporters = []
        metrics.num.samples = 2
        metrics.recording.level = INFO
        metrics.sample.window.ms = 30000
        num.standby.replicas = 0
        num.stream.threads = 1
        partition.grouper = class org.apache.kafka.streams.processor.DefaultPartitionGrouper
        poll.ms = 100
        processing.guarantee = at_least_once
        receive.buffer.bytes = 32768
        reconnect.backoff.max.ms = 1000
        reconnect.backoff.ms = 50
        replication.factor = 1
        request.timeout.ms = 40000
        retries = 0
        retry.backoff.ms = 100
        rocksdb.config.setter = null
        security.protocol = PLAINTEXT
        send.buffer.bytes = 131072
        state.cleanup.delay.ms = 600000
        state.dir = /tmp/kafka-streams
        topology.optimization = none
        upgrade.from = null
        windowstore.changelog.additional.retention.ms = 86400000
 (org.apache.kafka.streams.StreamsConfig:279)
[2018-11-02 16:51:11,090] INFO KsqlConfig values: 
        ksql.extension.dir = ext
        ksql.output.topic.name.prefix = 
        ksql.persistent.prefix = query_
        ksql.schema.registry.url = http://localhost:8081
        ksql.service.id = default_
        ksql.sink.partitions = 4
        ksql.sink.replicas = 1
        ksql.sink.window.change.log.additional.retention = 1000000
        ksql.statestore.suffix = _ksql_statestore
        ksql.transient.prefix = transient_
        ksql.udf.collect.metrics = false
        ksql.udf.enable.security.manager = true
        ksql.udfs.enabled = true
        ssl.cipher.suites = null
        ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
        ssl.endpoint.identification.algorithm = https
        ssl.key.password = null
        ssl.keymanager.algorithm = SunX509
        ssl.keystore.location = null
        ssl.keystore.password = null
        ssl.keystore.type = JKS
        ssl.protocol = TLS
        ssl.provider = null
        ssl.secure.random.implementation = null
        ssl.trustmanager.algorithm = PKIX
        ssl.truststore.location = null
        ssl.truststore.password = null
        ssl.truststore.type = JKS
 (io.confluent.ksql.util.KsqlConfig:279)
Outputting 1000000 to impressions3
[2018-11-02 16:51:11,432] INFO AvroDataConfig values: 
        schemas.cache.config = 1
        enhanced.avro.schema.support = false
        connect.meta.data = true
 (io.confluent.connect.avro.AvroDataConfig:179)
[2018-11-02 16:51:11,458] INFO AvroConverterConfig values: 
        schema.registry.url = [http://localhost:8081]
        basic.auth.user.info = [hidden]
        auto.register.schemas = true
        max.schemas.per.subject = 1000
        basic.auth.credentials.source = URL
        schema.registry.basic.auth.user.info = [hidden]
        value.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicNameStrategy
        key.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicNameStrategy
 (io.confluent.connect.avro.AvroConverterConfig:179)
[2018-11-02 16:51:11,466] INFO KafkaAvroSerializerConfig values: 
        schema.registry.url = [http://localhost:8081]
        basic.auth.user.info = [hidden]
        auto.register.schemas = true
        max.schemas.per.subject = 1000
        basic.auth.credentials.source = URL
        schema.registry.basic.auth.user.info = [hidden]
        value.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicNameStrategy
        key.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicNameStrategy
 (io.confluent.kafka.serializers.KafkaAvroSerializerConfig:179)
[2018-11-02 16:51:11,469] INFO KafkaAvroDeserializerConfig values: 
        schema.registry.url = [http://localhost:8081]
        basic.auth.user.info = [hidden]
        auto.register.schemas = true
        max.schemas.per.subject = 1000
        basic.auth.credentials.source = URL
        schema.registry.basic.auth.user.info = [hidden]
        specific.avro.reader = false
        value.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicNameStrategy
        key.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicNameStrategy
 (io.confluent.kafka.serializers.KafkaAvroDeserializerConfig:179)
[2018-11-02 16:51:11,470] INFO AvroDataConfig values: 
        schemas.cache.config = 1000
        enhanced.avro.schema.support = false
        connect.meta.data = false
 (io.confluent.connect.avro.AvroDataConfig:179)
[2018-11-02 16:51:11,470] INFO AvroConverterConfig values: 
        schema.registry.url = [http://localhost:8081]
        basic.auth.user.info = [hidden]
        auto.register.schemas = true
        max.schemas.per.subject = 1000
        basic.auth.credentials.source = URL
        schema.registry.basic.auth.user.info = [hidden]
        value.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicNameStrategy
        key.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicNameStrategy
 (io.confluent.connect.avro.AvroConverterConfig:179)
[2018-11-02 16:51:11,471] INFO KafkaAvroSerializerConfig values: 
        schema.registry.url = [http://localhost:8081]
        basic.auth.user.info = [hidden]
        auto.register.schemas = true
        max.schemas.per.subject = 1000
        basic.auth.credentials.source = URL
        schema.registry.basic.auth.user.info = [hidden]
        value.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicNameStrategy
        key.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicNameStrategy
 (io.confluent.kafka.serializers.KafkaAvroSerializerConfig:179)
[2018-11-02 16:51:11,471] INFO KafkaAvroDeserializerConfig values: 
        schema.registry.url = [http://localhost:8081]
        basic.auth.user.info = [hidden]
        auto.register.schemas = true
        max.schemas.per.subject = 1000
        basic.auth.credentials.source = URL
        schema.registry.basic.auth.user.info = [hidden]
        specific.avro.reader = false
        value.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicNameStrategy
        key.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicNameStrategy
 (io.confluent.kafka.serializers.KafkaAvroDeserializerConfig:179)
[2018-11-02 16:51:11,471] INFO AvroDataConfig values: 
        schemas.cache.config = 1000
        enhanced.avro.schema.support = false
        connect.meta.data = false
 (io.confluent.connect.avro.AvroDataConfig:179)
[2018-11-02 16:51:11,800] ERROR Failed to send HTTP request to endpoint: http://localhost:8081/subjects/impressions3-value/versions (io.confluent.kafka.schemaregistry.client.rest.RestService:176)
java.net.ConnectException: Connection refused (Connection refused)
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at java.net.Socket.connect(Socket.java:538)
        at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
        at sun.net.www.http.HttpClient.<init>(HttpClient.java:211)
        at sun.net.www.http.HttpClient.New(HttpClient.java:308)
        at sun.net.www.http.HttpClient.New(HttpClient.java:326)
        at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1202)
        at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1138)
        at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1032)
        at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:966)
        at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1316)
        at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1291)
        at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:172)
        at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:229)
        at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:320)
        at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:312)
        at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:307)
        at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.registerAndGetId(CachedSchemaRegistryClient.java:114)
        at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.register(CachedSchemaRegistryClient.java:153)
        at io.confluent.kafka.serializers.AbstractKafkaAvroSerializer.serializeImpl(AbstractKafkaAvroSerializer.java:79)
        at io.confluent.connect.avro.AvroConverter$Serializer.serialize(AvroConverter.java:116)
        at io.confluent.connect.avro.AvroConverter.fromConnectData(AvroConverter.java:75)
        at io.confluent.ksql.serde.connect.KsqlConnectSerializer.serialize(KsqlConnectSerializer.java:44)
        at io.confluent.ksql.serde.connect.KsqlConnectSerializer.serialize(KsqlConnectSerializer.java:27)
        at org.apache.kafka.common.serialization.ExtendedSerializer$Wrapper.serialize(ExtendedSerializer.java:65)
        at org.apache.kafka.common.serialization.ExtendedSerializer$Wrapper.serialize(ExtendedSerializer.java:55)
        at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:854)
        at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:816)
        at io.confluent.ksql.datagen.DataGenProducer.populateTopic(DataGenProducer.java:94)
        at io.confluent.ksql.datagen.DataGen.main(DataGen.java:100)
Exception in thread "main" org.apache.kafka.common.errors.SerializationException: Error serializing row to topic impressions3 using Converter API
Caused by: org.apache.kafka.connect.errors.DataException: impressions3
        at io.confluent.connect.avro.AvroConverter.fromConnectData(AvroConverter.java:77)
        at io.confluent.ksql.serde.connect.KsqlConnectSerializer.serialize(KsqlConnectSerializer.java:44)
        at io.confluent.ksql.serde.connect.KsqlConnectSerializer.serialize(KsqlConnectSerializer.java:27)
        at org.apache.kafka.common.serialization.ExtendedSerializer$Wrapper.serialize(ExtendedSerializer.java:65)
        at org.apache.kafka.common.serialization.ExtendedSerializer$Wrapper.serialize(ExtendedSerializer.java:55)
        at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:854)
        at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:816)
        at io.confluent.ksql.datagen.DataGenProducer.populateTopic(DataGenProducer.java:94)
        at io.confluent.ksql.datagen.DataGen.main(DataGen.java:100)
Caused by: org.apache.kafka.common.errors.SerializationException: Error serializing Avro message
Caused by: java.net.ConnectException: Connection refused (Connection refused)
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at java.net.Socket.connect(Socket.java:538)
        at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
        at sun.net.www.http.HttpClient.<init>(HttpClient.java:211)
        at sun.net.www.http.HttpClient.New(HttpClient.java:308)
        at sun.net.www.http.HttpClient.New(HttpClient.java:326)
        at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1202)
        at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1138)
        at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1032)
        at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:966)
        at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1316)
        at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1291)
        at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:172)
        at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:229)
        at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:320)
        at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:312)
        at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:307)
        at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.registerAndGetId(CachedSchemaRegistryClient.java:114)
        at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.register(CachedSchemaRegistryClient.java:153)
        at io.confluent.kafka.serializers.AbstractKafkaAvroSerializer.serializeImpl(AbstractKafkaAvroSerializer.java:79)
        at io.confluent.connect.avro.AvroConverter$Serializer.serialize(AvroConverter.java:116)
        at io.confluent.connect.avro.AvroConverter.fromConnectData(AvroConverter.java:75)
        at io.confluent.ksql.serde.connect.KsqlConnectSerializer.serialize(KsqlConnectSerializer.java:44)
        at io.confluent.ksql.serde.connect.KsqlConnectSerializer.serialize(KsqlConnectSerializer.java:27)
        at org.apache.kafka.common.serialization.ExtendedSerializer$Wrapper.serialize(ExtendedSerializer.java:65)
        at org.apache.kafka.common.serialization.ExtendedSerializer$Wrapper.serialize(ExtendedSerializer.java:55)
        at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:854)
        at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:816)
        at io.confluent.ksql.datagen.DataGenProducer.populateTopic(DataGenProducer.java:94)
        at io.confluent.ksql.datagen.DataGen.main(DataGen.java:100)

从我的连接属性文件中剪切:

...
key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://host008:8081
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://host008:8081
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
...

1 个答案:

答案 0 :(得分:2)

在您的数据源中,您已指定format=json-因此,您正在为Kafka主题生成JSON数据。您尚未提供连接器属性文件,但是由于您说如果使用Avro控制台生产者,连接器可以正常工作,那么我猜您正在连接器中使用Avro反序列化。

因此,请在datagen中使用Avro,或将连接器配置为使用json反序列化数据。