Kafka-Spark批处理流:WARN客户端。NetworkClient:引导代理断开连接

时间:2018-08-21 03:34:01

标签: scala apache-spark apache-kafka jaas sasl

我正在尝试将/[^\]+]/的行写入Dataframe。 kafka群集已Kerberized,我在--conf参数中提供了jaas.conf以便能够进行身份验证并连接到该群集。下面是我的代码:

Kafka topic

当我运行上面的代码时,它失败并显示以下错误: object app { val conf = new SparkConf().setAppName("Kerberos kafka ") val spark = SparkSession.builder().config(conf).enableHiveSupport().getOrCreate() System.setProperty("java.security.auth.login.config", "path to jaas.conf") spark.sparkContext.setLogLevel("ERROR") def main(args: Array[String]): Unit = { val test= spark.sql("select * from testing.test") test.show() println("publishing to kafka...") val test_final = test.selectExpr("cast(to_json(struct(*)) as string) AS value") test_final .show() test_final.write.format("kafka") .option("kafka.bootstrap.servers","XXXXXXXXX:9093") .option("topic", "test") .option("security.protocol", "SASL_SSL") .option("sasl.kerberos.service.name","kafka") .save() } }

当我查看执行程序的错误日志时,我看到了:

org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.

在以上日志中,我看到三个冲突的条目:

security.protocol = PLAINTEXT

sasl.kerberos.service.name = null

INFO utils.AppInfoParser:Kafka版本:0.9.0-kafka-2.0.2

我正在18/08/20 22:06:05 INFO producer.ProducerConfig: ProducerConfig values: compression.type = none metric.reporters = [] metadata.max.age.ms = 300000 metadata.fetch.timeout.ms = 60000 reconnect.backoff.ms = 50 sasl.kerberos.ticket.renew.window.factor = 0.8 bootstrap.servers = [xxxxxxxxx:9093] retry.backoff.ms = 100 sasl.kerberos.kinit.cmd = /usr/bin/kinit buffer.memory = 33554432 timeout.ms = 30000 key.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer sasl.kerberos.service.name = null sasl.kerberos.ticket.renew.jitter = 0.05 ssl.keystore.type = JKS ssl.trustmanager.algorithm = PKIX block.on.buffer.full = false ssl.key.password = null max.block.ms = 60000 sasl.kerberos.min.time.before.relogin = 60000 connections.max.idle.ms = 540000 ssl.truststore.password = null max.in.flight.requests.per.connection = 5 metrics.num.samples = 2 client.id = ssl.endpoint.identification.algorithm = null ssl.protocol = TLS request.timeout.ms = 30000 ssl.provider = null ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1] acks = 1 batch.size = 16384 ssl.keystore.location = null receive.buffer.bytes = 32768 ssl.cipher.suites = null ssl.truststore.type = JKS **security.protocol = PLAINTEXT** retries = 0 max.request.size = 1048576 value.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer ssl.truststore.location = null ssl.keystore.password = null ssl.keymanager.algorithm = SunX509 metrics.sample.window.ms = 30000 partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner send.buffer.bytes = 131072 linger.ms = 0 18/08/20 22:06:05 **INFO utils.AppInfoParser: Kafka version : 0.9.0-kafka-2.0.2** 18/08/20 22:06:05 INFO utils.AppInfoParser: Kafka commitId : unknown 18/08/20 22:06:05 INFO datasources.FileScanRDD: Reading File path: hdfs://nameservice1/user/test5/dt=2017-08-04/5a8bb121-3cab-4bed-a32b-9d0fae4a4e8b.parquet, range: 0-142192, partition values: [2017-08-04] 18/08/20 22:06:05 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 4 18/08/20 22:06:05 INFO memory.MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 33.9 KB, free 5.2 GB) 18/08/20 22:06:05 INFO broadcast.TorrentBroadcast: Reading broadcast variable 4 took 224 ms 18/08/20 22:06:05 INFO memory.MemoryStore: Block broadcast_4 stored as values in memory (estimated size 472.2 KB, free 5.2 GB) 18/08/20 22:06:06 WARN clients.NetworkClient: Bootstrap broker xxxxxxxxx:9093:9093 disconnected 18/08/20 22:06:06 WARN clients.NetworkClient: Bootstrap broker xxxxxxxxx:9093:9093 disconnected 18/08/20 22:06:07 WARN clients.NetworkClient: Bootstrap broker xxxxxxxxx:9093:9093 disconnected 18/08/20 22:06:07 WARN clients.NetworkClient: Bootstrap broker xxxxxxxxx:9093:9093 disconnected 18/08/20 22:06:07 WARN clients.NetworkClient: Bootstrap broker xxxxxxxxx:9093:9093 disconnected 18/08/20 22:06:08 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 4 18/08/20 22:06:08 INFO executor.Executor: Running task 1.0 in stage 2.0 (TID 4) 18/08/20 22:06:08 WARN clients.NetworkClient: Bootstrap broker xxxxxxxxx:9093:9093 disconnected 18/08/20 22:06:08 INFO datasources.FileScanRDD: Reading File path: hdfs://nameservice1/user/test5/dt=2017-08-10/2175e5d9-e969-41e9-8aa2-f329b5df06bf.parquet, range: 0-77484, partition values: [2017-08-10] 18/08/20 22:06:08 WARN clients.NetworkClient: Bootstrap broker xxxxxxxxx:9093:9093 disconnected 18/08/20 22:06:09 WARN clients.NetworkClient: Bootstrap broker xxxxxxxxx:9093:9093 disconnected 18/08/20 22:06:09 WARN clients.NetworkClient: Bootstrap broker xxxxxxxxx:9093:9093 disconnected 18/08/20 22:06:10 WARN clients.NetworkClient: Bootstrap broker xxxxxxxxx:9093:9093 disconnected 18/08/20 22:06:10 WARN clients.NetworkClient: Bootstrap broker xxxxxxxxx:9093:9093 disconnected 中设置security.protocolsasl.kerberos.service.name的值。这是否意味着未通过配置?我在jar中使用的Kafka依赖项是:

test_final.write....

<dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka_2.11</artifactId> <version>0.10.2.1</version> </dependency> 版本与0.10.2.1冲突吗?这可能是造成问题的原因吗?

这是我的jaas.conf:

0.9.0-kafka-2.0.2

这是我的spark-submit命令:

/* $Id$ */

kinit {
 com.sun.security.auth.module.Krb5LoginModule required;
};

KafkaClient {
 com.sun.security.auth.module.Krb5LoginModule required
 doNotPrompt=true
 useTicketCache=true
 useKeyTab=true
 principal="user@CORP.COM"
 serviceName="kafka"
 keyTab="/data/home/keytabs/user.keytab"
 client=true;
};

任何帮助将不胜感激。谢谢!

1 个答案:

答案 0 :(得分:0)

我不确定是否能解决您的特定问题,但是在spark structured streaming中,安全选项必须以kafka.为前缀

因此,您将拥有以下内容:

import org.apache.kafka.clients.CommonClientConfigs
import org.apache.kafka.common.config.SslConfigs    

val security = Map(
  CommonClientConfigs.SECURITY_PROTOCOL_CONFIG -> security.securityProtocol,
  SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG -> security.sslTrustStoreLocation,
  SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG -> security.sslTrustStorePassword,
  SslConfigs.SSL_KEYSTORE_LOCATION_CONFIG -> security.sslKeyStoreLocation,
  SslConfigs.SSL_KEYSTORE_PASSWORD_CONFIG -> security.sslKeyStorePassword,
  SslConfigs.SSL_KEY_PASSWORD_CONFIG -> security.sslKeyPassword
).map(x => "kafka." + x._1 -> x._2)

test_final.write.format("kafka")
      .option("kafka.bootstrap.servers","XXXXXXXXX:9093")
      .option("topic", "test")
      .options(security)
      .save()