在Google Dataflow上使用KafkaIO使用SSL连接到Kafka

时间:2019-01-24 00:19:41

标签: java ssl apache-kafka google-cloud-dataflow apache-beam

从服务器上,我能够从配置了SSL的远程kafka服务器主题连接并获取数据。

如何从GCP中使用Google数据流管道通过SSL信任库,密钥库证书位置和Google服务帐户json连接到远程kafka服务器?

我将Eclipse插件用于数据流运行程序选项。

如果我在GCS上指向证书,则在将证书指向Google存储桶时抛出错误。


Exception in thread "main" org.apache.beam.sdk.Pipeline$PipelineExecutionException: org.apache.kafka.common.KafkaException: Failed to construct kafka consumer

Caused by: org.apache.kafka.common.KafkaException:
 java.io.FileNotFoundException: 
gs:/bucket/folder/truststore-client.jks (No such file or directory)

跟随:Truststore and Google Cloud Dataflow

更新的代码指向SSL信任库,密钥库位置指向本地计算机的/ tmp目录,以防万一KafkaIO需要从文件路径读取。它没有引发FileNotFoundError。

尝试从GCP帐户运行服务器Java客户端代码,并且还使用Dataflow-Beam Java管道,出现以下错误。


ssl.truststore.location = <LOCAL MACHINE CERTICATE FILE PATH>
    ssl.truststore.password = [hidden]
    ssl.truststore.type = JKS
    value.deserializer = class org.apache.kafka.common.serialization.StringDeserializer

org.apache.kafka.common.utils.AppInfoParser$AppInfo <init>
INFO: Kafka version : 1.0.0
org.apache.kafka.common.utils.AppInfoParser$AppInfo <init>
INFO: Kafka commitId : aaa7af6d4a11b29d
org.apache.kafka.common.network.SslTransportLayer close
WARNING: Failed to send SSL Close message 
java.io.IOException: Broken pipe

org.apache.beam.runners.direct.RootProviderRegistry.getInitialInputs(RootProviderRegistry.java:81)
    at org.apache.beam.runners.direct.ExecutorServiceParallelExecutor.start(ExecutorServiceParallelExecutor.java:153)
    at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:205)
    at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:66)
    at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311)
    at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
    at 

org.apache.kafka.common.utils.LogContext$KafkaLogger warn
WARNING: [Consumer clientId=consumer-1, groupId=test-group] Connection to node -2 terminated during authentication. This may indicate that authentication failed due to invalid credentials.

任何建议或示例都值得赞赏。

1 个答案:

答案 0 :(得分:0)

Git从本地计算机克隆Java Maven项目或将其上载到GCP Cloud Shell主目录。 使用Cloud Shell终端上的DataflowRunner命令编译项目。

mvn -Pdataflow-runner compile exec:java \
      -Dexec.mainClass=com.packagename.JavaClass \
      -Dexec.args="--project=PROJECT_ID \
      --stagingLocation=gs://BUCKET/PATH/ \
      --tempLocation=gs://BUCKET/temp/ \
      --output=gs://BUCKET/PATH/output \
      --runner=DataflowRunner"

确保将运行程序设置为DataflowRunnner.class,并在云上运行该作业时在Dataflow Console上看到该作业。 DirectRunner执行将不会显示在云数据流控制台上。

将证书放入Maven项目内的resources文件夹中,并使用ClassLoader读取文件。

ClassLoader classLoader = getClass().getClassLoader();
File file = new File(classLoader.getResource("keystore.jks").getFile());    
resourcePath.put("keystore.jks",file.getAbsoluteFile().getPath());

按照https://stackoverflow.com/a/53549757/4250322

的说明,编写一个ConsumerFactoryFn()以复制Dataflow的“ / tmp /”目录中的证书。

将KafkaIO与资源路径属性一起使用。

Properties props = new Properties();
props.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SSL");
props.put(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG, "/tmp/truststore.jks");    
props.put(SslConfigs.SSL_KEYSTORE_LOCATION_CONFIG, "/tmp/keystore.jks");
props.put(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG,  PASSWORD);
props.put(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG,  PASSWORD); 
props.put(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG,  PASSWORD);

//other properties
...

PCollection<String> collection = p.apply(KafkaIO.<String, String>read()
                .withBootstrapServers(BOOTSTRAP_SERVERS)
                .withTopic(TOPIC)                                
                .withKeyDeserializer(StringDeserializer.class)
                .withValueDeserializer(StringDeserializer.class)                
                .updateConsumerProperties(props)
                .withConsumerFactoryFn(new ConsumerFactoryFn())
                .withMaxNumRecords(50)
                .withoutMetadata()
        ).apply(Values.<String>create());

// Apply Beam transformations and write to output.