无法在S3中使用密钥库构造Kafka使用者

时间:2019-10-09 03:19:00

标签: scala apache-spark amazon-s3 apache-kafka

我有一个Spark作业,使用来自受保护的Kafka主题的数据。当truststore.jks实际存在于运行作业的位置时,此方法有效。但是,如果我指向我的S3存储桶以供Spark捕获JKS文件,则此操作将失败。

这是我的工作的样子:

val kafkaReadStream = spark.readStream
    .format("kafka")
    .option("kafka.bootstrap.servers", "host1:port1")
    .option("subscribe", "my_topic")
    .option("kafka.ssl.truststore.location", "s3a://a-bucket/a-directory/truststore.jks"
    ...

要使Spark能够到达我的AWS存储桶,我已spark-shell进行了如下配置:

./spark-2.4.4-bin-hadoop2.7/bin/spark-shell \
    --packages org.apache.hadoop:hadoop-aws:2.7.3,org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.4 \
    --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
    --conf spark.hadoop.fs.s3a.access.key=$AWS_ACCESS_KEY \
    --conf spark.hadoop.fs.s3a.secret.key=$AWS_SECRET_KEY

这是我得到的错误:

org.apache.spark.SparkException: Exception thrown in awaitResult:
  ...
  ... 57 elided
Caused by: org.apache.kafka.common.KafkaException: Failed to construct kafka consumer
  at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:799)
  at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:615)
  at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:596)
  at org.apache.spark.sql.kafka010.SubscribeStrategy.createConsumer(ConsumerStrategy.scala:62)
  at org.apache.spark.sql.kafka010.KafkaOffsetReader.consumer(KafkaOffsetReader.scala:86)
  at org.apache.spark.sql.kafka010.KafkaOffsetReader$$anonfun$fetchTopicPartitions$1.apply(KafkaOffsetReader.scala:119)
  at org.apache.spark.sql.kafka010.KafkaOffsetReader$$anonfun$fetchTopicPartitions$1.apply(KafkaOffsetReader.scala:116)
  at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
  at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at org.apache.spark.sql.kafka010.KafkaOffsetReader$$anon$2$$anon$1.run(KafkaOffsetReader.scala:59)
Caused by: org.apache.kafka.common.KafkaException: org.apache.kafka.common.KafkaException: org.apache.kafka.common.KafkaException: Failed to load SSL keystore s3a://a-bucket/a-directory/truststore.jks of type JKS
  at org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:153)
  at org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:140)
  at org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:65)
  at org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:88)
  at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:713)
  ... 11 more
Caused by: org.apache.kafka.common.KafkaException: org.apache.kafka.common.KafkaException: Failed to load SSL keystore s3a://a-bucket/a-directory/truststore.jks of type JKS
  at org.apache.kafka.common.security.ssl.SslFactory.configure(SslFactory.java:137)
  at org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:149)
  ... 15 more
Caused by: org.apache.kafka.common.KafkaException: Failed to load SSL keystore s3a://a-bucket/a-directory/truststore.jks of type JKS
  at org.apache.kafka.common.security.ssl.SslFactory$SecurityStore.load(SslFactory.java:330)
  at org.apache.kafka.common.security.ssl.SslFactory.createSSLContext(SslFactory.java:226)
  at org.apache.kafka.common.security.ssl.SslFactory.configure(SslFactory.java:135)
  ... 16 more
Caused by: java.io.FileNotFoundException: s3a:/adlearner-featurestore-us-west-2/kafka-configs/truststore.jks (No such file or directory)
  at java.io.FileInputStream.open0(Native Method)

我知道我对S3的设置是正确的,因为我可以通过在shell上运行此文件来读取与此JKS文件位于同一目录中的文本文件:

spark.read.textFile("s3a://a-bucket/a-directory/test.txt").show()

0 个答案:

没有答案