Flume HDFS水槽不断制作.tmp文件

时间:2019-04-23 04:19:31

标签: flume

某些HDFS接收器文件未关闭

有人说,如果接收器进程因诸如超时条件之类的问题而失败,那么它将不会尝试再次关闭文件。

我一直在查看我的水槽日志文件,但是没有错误。 但是,日志文件显示,每个循环中,水槽都会生成两个tmp文件,并且仅关闭一个tmp文件...

对配置的任何建议将不胜感激! 谢谢!

#Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

#Configure the Kafka Source
a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r1.batchSize = 1000
#a1.sources.r1.batchDurationMillis = 2000
a1.sources.r1.kafka.bootstrap.servers = 150.2.237.16:6667,150.2.237.17:6667
a1.sources.r1.kafka.topics = 1-sysmaster1-thread
a1.sources.r1.kafka.consumer.group.id = flume_hdfs_consumer

#Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /user/flume/kafka-data/1-sysmaster1-thread/%y%m%d
a1.sinks.k1.hdfs.filePrefix = 1-sysmaster1-thread-%H%M

#Describing sink with the problem of Encoding
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat = Text

#Describing sink with the problem of many hdfs files
### Roll a file after certain amount of events occurs  ###
a1.sinks.k1.hdfs.rollInterval = 0
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.rollCount = 10000
a1.sinks.k1.hdfs.batchSize = 1000

#Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
a1.channels.c1.transactionCapacity = 1000

#Use File channel
#a1.channels.c1.type = file
#a1.channels.cl.checkpointDir = /home/bigdata/flume/checkpoint
#a1.channels.c1.dataDirs = /home/bigdata/flume/data

#Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

23 4월 2019 11:47:04,105 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:246)  - Creating /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1147.1555987622865.tmp
23 4월 2019 11:48:03,382 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSDataStream.configure:57)  - Serializer = TEXT, UseRawLocalFileSystem = false
23 4월 2019 11:48:03,457 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:246)  - Creating /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1148.1555987683383.tmp
23 4월 2019 11:48:08,664 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.doClose:438)  - Closing /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1148.1555987683383.tmp
23 4월 2019 11:48:08,689 INFO  [hdfs-k1-call-runner-8] (org.apache.flume.sink.hdfs.BucketWriter$7.call:681)  - Renaming /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1148.1555987683383.tmp to /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1148.1555987683383
23 4월 2019 11:48:08,712 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:246)  - Creating /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1148.1555987683384.tmp
23 4월 2019 11:49:03,711 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSDataStream.configure:57)  - Serializer = TEXT, UseRawLocalFileSystem = false
23 4월 2019 11:49:03,806 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:246)  - Creating /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1149.1555987743712.tmp
23 4월 2019 11:49:05,439 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.doClose:438)  - Closing /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1149.1555987743712.tmp
23 4월 2019 11:49:05,460 INFO  [hdfs-k1-call-runner-5] (org.apache.flume.sink.hdfs.BucketWriter$7.call:681)  - Renaming /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1149.1555987743712.tmp to /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1149.1555987743712
23 4월 2019 11:49:05,480 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:246)  - Creating /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1149.1555987743713.tmp
23 4월 2019 11:50:02,354 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSDataStream.configure:57)  - Serializer = TEXT, UseRawLocalFileSystem = false
23 4월 2019 11:50:02,387 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:246)  - Creating /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1150.1555987802355.tmp
23 4월 2019 11:50:03,015 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.doClose:438)  - Closing /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1150.1555987802355.tmp
23 4월 2019 11:50:03,032 INFO  [hdfs-k1-call-runner-4] (org.apache.flume.sink.hdfs.BucketWriter$7.call:681)  - Renaming /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1150.1555987802355.tmp to /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1150.1555987802355

[root@sd-mds-01 logs]# hdfs dfs -ls /user/flume/kafka-data/1-sysmaster1-thread/190423/
Found 163 items
-rw-r--r--   3 root hdfs    1781109 2019-04-23 11:20 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1120.1555986001199
-rw-r--r--   3 root hdfs     212118 2019-04-23 11:20 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1120.1555986001200.tmp
-rw-r--r--   3 root hdfs    1777270 2019-04-23 11:21 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1121.1555986062575
-rw-r--r--   3 root hdfs      54451 2019-04-23 11:21 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1121.1555986062576.tmp
-rw-r--r--   3 root hdfs    1781741 2019-04-23 11:22 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1122.1555986123181
-rw-r--r--   3 root hdfs      34735 2019-04-23 11:22 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1122.1555986123182.tmp
-rw-r--r--   3 root hdfs    1782315 2019-04-23 11:23 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1123.1555986183768
-rw-r--r--   3 root hdfs      28682 2019-04-23 11:23 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1123.1555986183769.tmp
-rw-r--r--   3 root hdfs    1782437 2019-04-23 11:24 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1124.1555986244304
-rw-r--r--   3 root hdfs     211547 2019-04-23 11:24 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1124.1555986244305.tmp
-rw-r--r--   3 root hdfs    1782775 2019-04-23 11:25 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1125.1555986302891
-rw-r--r--   3 root hdfs      35918 2019-04-23 11:25 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1125.1555986302892.tmp
-rw-r--r--   3 root hdfs    1781180 2019-04-23 11:26 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1126.1555986362097
-rw-r--r--   3 root hdfs      30967 2019-04-23 11:26 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1126.1555986362098.tmp
-rw-r--r--   3 root hdfs    1781682 2019-04-23 11:27 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1127.1555986423432
-rw-r--r--   3 root hdfs      41381 2019-04-23 11:27 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1127.1555986423433.tmp
-rw-r--r--   3 root hdfs    1781710 2019-04-23 11:28 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1128.1555986483928
-rw-r--r--   3 root hdfs     211240 2019-04-23 11:28 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1128.1555986483929.tmp
-rw-r--r--   3 root hdfs    1785456 2019-04-23 11:29 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1129.1555986542442

1 个答案:

答案 0 :(得分:0)

我发现了问题...

我将配置设置为以时间前缀滚动文件

a1.sinks.k1.hdfs.filePrefix = 1-sysmaster1-thread-%H%M

您会看到结果路径,在分钟结束时创建的每个文件都无法正确汇总。

从配置文件中删除该行后,它工作正常。