我正在使用Scala 2.12,并且具有将消息转换为Avro(需要转换)和kafka客户端所需的库。
我正在运行其他应用程序(Apache NiFi)的Linux主机(dev)上运行代码,并且能够创建KafkaProducer并将消息发布到远程Kafka。
由于目前是开发人员,因此协议为PLAINTEXT。
例如Nifi中的KafkaProducer配置的说明。
acks = 1
batch.size = 16384
block.on.buffer.full = false
bootstrap.servers = [server1.cloud.domain:9096, server2.cloud.domain:9096, server3.cloud.domain:9096]
buffer.memory = 33554432
client.id =
compression.type = none
connections.max.idle.ms = 540000
interceptor.classes = null
key.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
linger.ms = 0
max.block.ms = 5000
max.in.flight.requests.per.connection = 5
max.request.size = 1048576
metadata.fetch.timeout.ms = 60000
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.sample.window.ms = 30000
partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner
receive.buffer.bytes = 32768
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retries = 0
retry.backoff.ms = 100
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = kafka
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
send.buffer.bytes = 131072
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
timeout.ms = 30000
value.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
此外,NiFi还从java选项开始使用JAAS文件,其内容为:
KafkaClient {
com.sun.security.auth.module.Krb5LoginModule required
principal="myUserName@myRealm"
useKeyTab=true
client=true
keyTab="/path/myfile.keytab"
serviceName="kafka";
};
还可以使用krb5.conf文件。
通过上述配置,NiFi可以创建KafkaProducer并向其发送消息。
现在,我在Scala代码中使用相同的代码。简单类,它使用以下build.sbt和代码来发送消息。
build.sbt:
// https://mvnrepository.com/artifact/org.apache.avro/avro
libraryDependencies += "org.apache.avro" % "avro" % "1.8.1"
// https://mvnrepository.com/artifact/org.apache.kafka/kafka
libraryDependencies += "org.apache.kafka" %% "kafka" % "2.1.1"
libraryDependencies += "org.slf4j" % "slf4j-simple" % "1.6.4"
fork in run := true
javaOptions += "-Djava.security.auth.login.config=/path/to/jaas/kafka-jaas.conf"
javaOptions += "-Djava.security.krb5.conf=/path/to/krb/krb5.conf"
我要发送消息的代码。为简洁起见,删除了多余的行。请注意,向Avro创建数据的测试运行良好。如果将相同的消息发送给NiFi,则可以正确发布到该主题。没有运行,是使用Scala发布到kafka。
代码:
package example
import java.io.ByteArrayOutputStream
import java.util
import java.io.File
import java.util.{Properties, UUID}
import org.apache.avro.Schema.Parser
import org.apache.avro.Schema
import org.apache.avro.file.DataFileWriter
import org.apache.avro.generic.{GenericData, GenericDatumReader, GenericDatumWriter, GenericRecord}
import org.apache.avro.specific.SpecificDatumWriter
import org.apache.avro.generic.GenericData.Record
import org.apache.avro.io.{DecoderFactory, EncoderFactory}
import org.apache.kafka.clients.producer.{KafkaProducer, ProducerConfig, ProducerRecord}
import org.apache.kafka.common.serialization.StringSerializer
import scala.io.Source
import scala.io.StdIn
object Hello extends Greeting with App {
// case classes for creating avro record
// This part works fine.
val schemaFile = "/path/Schema.avsc"
val schema = new Schema.Parser().parse(new File(schemaFile))
val reader = new GenericDatumReader[GenericRecord](schema)
val avroRecord = new GenericData.Record(schema)
// populate correctly the record.
// works fine.
val brokers = "server1.domain:9096,server2.domain:9096,server3.domain:9096"
val topic = "myTopic"
private def configuration: Properties = {
val props = new Properties()
props.put("bootstrap.servers", brokers)
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("value.serializer", "org.apache.kafka.common.serialization.ByteArraySerializer")
props.put("security.protocol", "PLAINTEXT")
props.put("sasl.kerberos.service.name", "kafka")
props.put("acks", "all")
props.put("retries","0")
props
}
val producer = new KafkaProducer[String, Array[Byte]](configuration)
val writer = new SpecificDatumWriter[GenericRecord](schema)
val out = new ByteArrayOutputStream()
val encoder = EncoderFactory.get.binaryEncoder(out, null)
writer.write(avroRecord, encoder)
encoder.flush()
out.close()
val serializedBytes: Array[Byte] = out.toByteArray()
val recordToSend = new ProducerRecord[String, Array[Byte]](topic, serializedBytes)
producer.send(recordToSend)
}
trait Greeting {
lazy val greeting: String = "hello"
}
当我在sbt命令行上运行它时:
sbt clean
sbt编译
sbt运行
我得到以下错误/输出。什么都没发表。
输出:
-bash-4.2$ sbt run
[warn] Executing in batch mode.
[warn] For better performance, hit [ENTER] to switch to interactive mode, or
[warn] consider launching sbt without any commands, or explicitly passing 'shell'
[info] Loading project definition from /path/Scala/hello-world/project
[info] Set current project to hello-world (in build file:/path/Scala/hello-world/)
[info] Running example.Hello
[info] hello
[info]
[error] 9 [main] INFO org.apache.kafka.clients.producer.ProducerConfig - ProducerConfig values:
[error] acks = 1
[error] batch.size = 16384
[error] bootstrap.servers = [server1.cloud.domain:9096, server2.cloud.domain:9096, server3.cloud.domain:9096]
[error] buffer.memory = 33554432
[error] client.dns.lookup = default
[error] client.id =
[error] compression.type = none
[error] connections.max.idle.ms = 540000
[error] delivery.timeout.ms = 120000
[error] enable.idempotence = false
[error] interceptor.classes = []
[error] key.serializer = class org.apache.kafka.common.serialization.StringSerializer
[error] linger.ms = 0
[error] max.block.ms = 60000
[error] max.in.flight.requests.per.connection = 5
[error] max.request.size = 1048576
[error] metadata.max.age.ms = 300000
[error] metric.reporters = []
[error] metrics.num.samples = 2
[error] metrics.recording.level = INFO
[error] metrics.sample.window.ms = 30000
[error] partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner
[error] receive.buffer.bytes = 32768
[error] reconnect.backoff.max.ms = 1000
[error] reconnect.backoff.ms = 50
[error] request.timeout.ms = 30000
[error] retries = 0
[error] retry.backoff.ms = 100
[error] sasl.client.callback.handler.class = null
[error] sasl.jaas.config = null
[error] sasl.kerberos.kinit.cmd = /usr/bin/kinit
[error] sasl.kerberos.min.time.before.relogin = 60000
[error] sasl.kerberos.service.name = kafka
[error] sasl.kerberos.ticket.renew.jitter = 0.05
[error] sasl.kerberos.ticket.renew.window.factor = 0.8
[error] sasl.login.callback.handler.class = null
[error] sasl.login.class = null
[error] sasl.login.refresh.buffer.seconds = 300
[error] sasl.login.refresh.min.period.seconds = 60
[error] sasl.login.refresh.window.factor = 0.8
[error] sasl.login.refresh.window.jitter = 0.05
[error] sasl.mechanism = GSSAPI
[error] security.protocol = PLAINTEXT
[error] send.buffer.bytes = 131072
[error] ssl.cipher.suites = null
[error] ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
[error] ssl.endpoint.identification.algorithm =
[error] ssl.key.password = null
[error] ssl.keymanager.algorithm = SunX509
[error] ssl.keystore.location = null
[error] ssl.keystore.password = null
[error] ssl.keystore.type = JKS
[error] ssl.protocol = TLS
[error] ssl.provider = null
[error] ssl.secure.random.implementation = null
[error] ssl.trustmanager.algorithm = PKIX
[error] ssl.truststore.location = null
[error] ssl.truststore.password = null
[error] ssl.truststore.type = JKS
[error] transaction.timeout.ms = 60000
[error] transactional.id = null
[error] value.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
[error]
[error] 109 [main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka version : 2.1.1
[error] 109 [main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka commitId : 21234bee31165527
[error] 248 [kafka-producer-network-thread | producer-1] INFO org.apache.kafka.clients.Metadata - Cluster ID: 5NMDh7lDS-SxXpgprjR6oA
[success] Total time: 1 s, completed Mar 6, 2019 1:38:14 PM
我敢肯定,它必须在安全性或kerberos方面做些事情。但是其他应用程序可以推送消息,而不能使用我的scala代码。
更新:
基于@tgrez的响应,我尝试阻止Future get。
//producer.send(recordToSend)
val metaF: Future[RecordMetadata] = producer.send(recordToSend)
val meta = metaF.get() //blocking
val msgLog =
s"""
|offset = ${meta.offset()}
|partition = ${meta.partition()}
|topic = ${meta.topic()}
""".stripMargin
println(msgLog)
producer.close()
但是我还是类似的错误。
[error] 10 [main] INFO org.apache.kafka.clients.producer.ProducerConfig - ProducerConfig values:
[error] acks = 1
[error] batch.size = 16384
[error] bootstrap.servers = [server1.cloud.domain:9096, server2.cloud.domain:9096, server3.cloud.domain:9096]
[error] buffer.memory = 33554432
[error] client.dns.lookup = default
[error] client.id =
[error] compression.type = none
[error] connections.max.idle.ms = 540000
[error] delivery.timeout.ms = 120000
[error] enable.idempotence = false
[error] interceptor.classes = []
[error] key.serializer = class org.apache.kafka.common.serialization.StringSerializer
[error] linger.ms = 0
[error] max.block.ms = 60000
[error] max.in.flight.requests.per.connection = 5
[error] max.request.size = 1048576
[error] metadata.max.age.ms = 300000
[error] metric.reporters = []
[error] metrics.num.samples = 2
[error] metrics.recording.level = INFO
[error] metrics.sample.window.ms = 30000
[error] partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner
[error] receive.buffer.bytes = 32768
[error] reconnect.backoff.max.ms = 1000
[error] reconnect.backoff.ms = 50
[error] request.timeout.ms = 30000
[error] retries = 0
[error] retry.backoff.ms = 100
[error] sasl.client.callback.handler.class = null
[error] sasl.jaas.config = null
[error] sasl.kerberos.kinit.cmd = /usr/bin/kinit
[error] sasl.kerberos.min.time.before.relogin = 60000
[error] sasl.kerberos.service.name = kafka
[error] sasl.kerberos.ticket.renew.jitter = 0.05
[error] sasl.kerberos.ticket.renew.window.factor = 0.8
[error] sasl.login.callback.handler.class = null
[error] sasl.login.class = null
[error] sasl.login.refresh.buffer.seconds = 300
[error] sasl.login.refresh.min.period.seconds = 60
[error] sasl.login.refresh.window.factor = 0.8
[error] sasl.login.refresh.window.jitter = 0.05
[error] sasl.mechanism = GSSAPI
[error] security.protocol = PLAINTEXT
[error] send.buffer.bytes = 131072
[error] ssl.cipher.suites = null
[error] ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
[error] ssl.endpoint.identification.algorithm =
[error] ssl.key.password = null
[error] ssl.keymanager.algorithm = SunX509
[error] ssl.keystore.location = null
[error] ssl.keystore.password = null
[error] ssl.keystore.type = JKS
[error] ssl.protocol = TLS
[error] ssl.provider = null
[error] ssl.secure.random.implementation = null
[error] ssl.trustmanager.algorithm = PKIX
[error] ssl.truststore.location = null
[error] ssl.truststore.password = null
[error] ssl.truststore.type = JKS
[error] transaction.timeout.ms = 60000
[error] transactional.id = null
[error] value.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
[error]
[error] 110 [main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka version : 2.1.1
[error] 110 [main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka commitId : 21234bee31165527
[error] 249 [kafka-producer-network-thread | producer-1] INFO org.apache.kafka.clients.Metadata - Cluster ID: 5NMDh7lDS-SxXpgprjR6oA
[info]
[info] offset = 8
[info] partition = 1
[info] topic = myTopic
[info]
[error] 323 [main] INFO org.apache.kafka.clients.producer.KafkaProducer - [Producer clientId=producer-1] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms.
[success] Total time: 1 s, completed Mar 6, 2019 3:26:53 PM
这里我想念什么吗?
更新2:
如下所述,我更改了代码。但是它也不起作用。我意识到序列化有问题。
我已经有GenericData.Record格式的avroRecord。我不能使用相同的数据将数据发布到Kafka吗?为什么我必须使用字节数组或其他任何串行器来实现相同的目的?
我发现的唯一示例是使用io.confluent avro序列化程序。但是我无法使用它,因为sbt或maven现在无法下载。实际使用URL:http://packages.confluent.io/maven/不起作用。我不知何故下载了jar并将其用作外部库。
更改为代码:
props.put("value.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer")
val producer = new KafkaProducer[String, GenericData.Record](configuration)
val recordToSend = new ProducerRecord[String, GenericData.Record](topic, avroRecord)
现在一切正常。
但是,我仍在寻找任何其他序列化器类(在Maven中可用)以GenericData而不是字节数组的形式发送消息。
更新3:
根据用户@KZapagol的建议,我尝试使用相同的代码并得到以下错误。
模式:(它很复杂,所以如果我正确地转换数据,则需要帮助)
{"type": "record","name": "MyPnl","doc": "This schema contains the metadata fields wrapped in a header field which follows the official schema.","fields": [{"name":"header","type":{"type":"record","name":"header","fields":[{"name":"messageId","type":"string"},{"name":"businessId","type":"string"},{"name":"batchId","type":"string"},{"name":"sourceSystem","type":"string"},{"name":"secondarySourceSystem","type":[ "null", "string" ]},{"name":"sourceSystemCreationTimestamp","type":"long","logicalType": "timestamp-millis"},{"name":"sentBy","type":"string"},{"name":"sentTo","type":"string"},{"name":"messageType","type":"string"},{"name":"schemaVersion","type":"string"},{"name":"processing","type":"string"},{"name":"recordOffset","type":[ "null", "string" ]}]}},{"name":"pnlData","type":{"type":"record","name":"pnlData","fields":[{"name":"pnlHeader","type":{"type":"record","name":"pnlData","namespace":"pnlHeader","fields":[{"name":"granularity","type":"string"},{"name":"pnlType","type":"string"},{"name":"pnlSubType","type":"string"},{"name":"businessDate","type":"string","logicalType": "date"},{"name":"bookId","type":"string"},{"name":"bookDescription","type":"string"},{"name":"pnlStatus","type":"string"}]}},{"name":"pnlBreakDown","type":{"type":"array","items":{"type":"record","name":"pnlData","namespace":"pnlBreakDown","fields":[{"name":"category","type":[ "null", "string" ]},{"name":"subCategory","type":[ "null", "string" ]},{"name":"riskCategory","type":[ "null", "string" ]},{"name":"pnlCurrency","type":"string"},{"name":"pnlDetails", "type":{"type":"array","items": {"type":"record","name":"pnlData","namespace":"pnlDetails","fields":[{"name":"pnlLocalAmount","type":"double"},{"name":"pnlCDEAmount","type":"double"}]}}}]}}}]}}]}
我上面有相应的案例类。 (请建议我是否在这里错过了什么?)
case class MessageHeader( messageId: String,
businessId: String,
batchId: String,
sourceSystem: String,
secondarySourceSystem: String,
sourceSystemCreationTimestamp: Long,
sentBy: String,
sentTo: String,
messageType: String,
schemaVersion: String,
processing: String,
recordOffset: String
)
case class PnlHeader ( granularity: String,
pnlType: String,
pnlSubType: String,
businessDate: String,
bookId: String,
bookDescription: String,
pnlStatus: String
)
case class PnlDetails ( pnlLocalAmount: Double,
pnlCDEAmount: Double
)
case class PnlBreakdown ( category: String,
subCategory: String,
riskCategory: String,
pnlCurrency: String,
pnlDetails: List[PnlDetails]
)
case class PnlData ( pnlHeader: PnlHeader, pnlBreakdown: List[PnlBreakdown] )
case class PnlRecord (header: MessageHeader, pnlData: PnlData )
我已经以上述PnlRecord格式对数据进行了建模。我有这样的记录清单。
从此类记录的列表中,我进行迭代并尝试将其发布到Kafka。
// Create Producer
val producer = new KafkaProducer[String, Array[Byte]](properties)
// This filename is file where above schema is saved.
val avroJsonSchema = Source.fromFile(new File(schemaFileName)).getLines.mkString
val avroMessage = new AvroMessage(avroJsonSchema)
val avroRecord = new Record(avroMessage.schema)
// recordListToSend is of type: List[PnlRecord]
for (record <- recordListToSend) {
avroRecord.put("header", record.header)
avroRecord.put("pnlData", record.pnlData)
//logger.info(s"Record: ${avroRecord}\n")
avroMessage.gdw.write(avroRecord, EncoderFactory.get().binaryEncoder(avroMessage.baos, null))
avroMessage.dfw.append(avroRecord)
avroMessage.dfw.close()
val bytes = avroMessage.baos.toByteArray
// send data
producer.send(new ProducerRecord[String, Array[Byte]](topic, bytes), new ProducerCallback)
//flush data
producer.flush()
//flush and close producer
producer.close()
}
AvroMessage类(由用户建议)
import java.io.ByteArrayOutputStream
import org.apache.avro
import org.apache.avro.Schema
import org.apache.avro.file.CodecFactory
import org.apache.avro.generic.{GenericDatumWriter, GenericRecord}
class AvroMessage(avroJsonSchema: String) {
val parser = new Schema.Parser()
val schema = parser.parse(avroJsonSchema)
val baos = new ByteArrayOutputStream()
val gdw = new GenericDatumWriter[GenericRecord](schema)
val dfw = new avro.file.DataFileWriter[GenericRecord](gdw)
val compressionLevel = 5
dfw.setCodec(CodecFactory.deflateCodec(compressionLevel))
dfw.create(schema, baos)
}
我收到以下错误:
2019-03-13 16:00:09.855 [application-akka.actor.default-dispatcher-11] ERROR controllers.SAController.$anonfun$publishToSA$2(34) - com.domain.sa.model.MessageHeader cannot be cast to org.apache.avro.generic.IndexedRecord
java.lang.ClassCastException: ca.domain.my.sa.model.MessageHeader cannot be cast to org.apache.avro.generic.IndexedRecord
at org.apache.avro.generic.GenericData.getField(GenericData.java:697)
at org.apache.avro.generic.GenericData.getField(GenericData.java:712)
at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:164)
at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
at ca.domain.my.sa.dao.myPnlDao$.$anonfun$publishAvroToKafka$1(myPnlDao.scala:95)
根据模式,我原来的案例类是否正确?
我的MessageHeader案例类如上所示。
我的模式显示在上方(已更新)。
我的记录:
Record: {"header": Header(my_20190313180602_00000011,my_BookLevel_Daily_Regular_20181130_EMERGINGTRS,11_20181130_8259,my,null,65162584,my,SA,PnLMessage,test,RealTime,null), "pnlData": PnlData(PnlHeader(BookLevel,Daily,Regular,2018-11-30,8259,EMERGINGTRS,Locked),List(PnlBreakdown(null,null,null,eur,List(PnlDetails(0.0,0.0022547507286072))), PnlBreakdown(null,null,null,jpy,List(PnlDetails(0.0,0.0))), PnlBreakdown(null,null,null,usd,List(PnlDetails(0.19000003399301,0.642328574985149))), PnlBreakdown(null,null,null,brl,List(PnlDetails(2.65281414613128E-8,2.4107750505209E-5))), PnlBreakdown(null,null,null,gbp,List(PnlDetails(0.0,-5.05781173706088E-5))), PnlBreakdown(null,null,null,cad,List(PnlDetails(145.399999991953,145.399999991953)))))}
答案 0 :(得分:0)
它可能比看起来简单。 send
方法是异步的,它返回一个Future<RecordMetadata>
。您的示例在实际发送消息之前就退出了。
Kafka生产者在后台批量处理消息,因此,要确保发送消息,您应该使用例如Future.get
(这意味着等待代理响应元数据)或确保使用kafkaProducer.flush()
刷新缓冲区。
在测试中,我建议阻止Future
。