使用Kafka生产者 - FileSource连接器读取文件的内容

时间:2018-02-09 08:29:20

标签: apache-kafka apache-kafka-connect

如何使用Kafka生产者来读取文件的内容?这里找到的典型解决方案(将文件通过|传递给生产者)看起来很脏而且很丑。

1 个答案:

答案 0 :(得分:1)

我最近发现一个解决方案比将文件内容传递到生成器shell更合适,即使用FileSource Connector

根据链接,FileSource Connector旨在准确解决"将文件数据读入生产者"的用例,例如检查日志文件的内容并在{{1}时启动警报遇到{或[ERROR]

完整的命令是(假设我们在Kafka的根文件夹中):

[FATAL]

要配置的两个属性文件:

  • bin/connect-standalone.sh config/connect-standalone.properties config/connect-file-source.properties
  • config/connect-standalone.properties

第一个定义了如何连接到独立连接器。它就像:

config/connect-file-source.properties

相当直截了当。只有两件事需要注意:

  • # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # These are defaults. This file just demonstrates how to override some settings. bootstrap.servers=localhost:9092 # The converters specify the format of data in Kafka and how to translate it into Connect data. Every Connect user will # need to configure these based on the format they want their data in when loaded from or stored into Kafka key.converter=org.apache.kafka.connect.json.JsonConverter value.converter=org.apache.kafka.connect.json.JsonConverter # Converter-specific settings can be passed in by prefixing the Converter's setting with the converter we want to apply # it to key.converter.schemas.enable=false value.converter.schemas.enable=false # The internal converter used for offsets and config data is configurable and must be specified, but most users will # always want to use the built-in default. Offset and config data is never visible outside of Kafka Connect in this format. internal.key.converter=org.apache.kafka.connect.json.JsonConverter internal.value.converter=org.apache.kafka.connect.json.JsonConverter internal.key.converter.schemas.enable=false internal.value.converter.schemas.enable=false offset.storage.file.filename=/tmp/connect.offsets # Flush much faster than normal, which is useful for testing/debugging offset.flush.interval.ms=10000 # Set to a list of filesystem paths separated by commas (,) to enable class loading isolation for plugins # (connectors, converters, transformations). The list should consist of top level directories that include # any combination of: # a) directories immediately containing jars with plugins and their dependencies # b) uber-jars with plugins and their dependencies # c) directories immediately containing the package directory structure of classes of plugins and their dependencies # Note: symlinks will be followed to discover dependencies or plugins. # Examples: # plugin.path=/usr/local/share/java,/usr/local/share/kafka/plugins,/opt/connectors, #plugin.path= :Kafka引导程序服务器
  • bootstrap.servers=localhost:9092:您必须将它们设置为(internal.)key/value.converter.schemas.enable=false以解析文件中的字符串行。

第二个文件更简单:

false
  • name=local-file-source connector.class=FileStreamSource tasks.max=1 file=/tmp/test.txt topic=connect-test :要阅读的文件
  • file:创建一个让消费者倾听的主题

如果您想使用Storm消费内容,那就足够了。

如果要将Kafka中的内容写入文件,而不是读取文件,则使用FileSink Connector。我个人并没有使用它,但我想也是如此,但在消费者方面。配置文件为topic