使用FLUME将文件复制到HDFS的问题

时间:2015-11-07 08:31:55

标签: flume flume-ng

我在本地文件系统中有一个文件,我想使用FLUME在HDFS中移动。

hduser@ubuntu:~$ ls -ltr /home/hduser/Desktop/flume_test_dir/
total 7060
-rwxrw-rw- 1 hduser hduser 7226791 Nov  6 10:31 airports.csv
hduser@ubuntu:~$ hadoop fs -ls hdfs://localhost:54310//user/hduser/flume/spool5
Found 2 items
-rw-r--r--   1 hduser supergroup          0 2015-11-07 00:20 hdfs://localhost:54310/user/hduser/flume/spool5/FlumeData.1446884442571.tmp
-rw-r--r--   1 hduser supergroup     137560 2015-11-07 00:21 hdfs://localhost:54310/user/hduser/flume/spool5/FlumeData.1446884464560.tmp

所以我的实际文件大小是 7226791 。 FLUME执行后,它会创建两个大小 137560 0 的文件。

所以问题是整个文件没有被复制到HDFS中,而且它也被拆分了。我想将其作为单个文件移动,并希望移动整个文件。我使用以下配置。那里可能需要做些什么改变?

#Flume Configuration Starts
# Define a file channel called fileChannel on agent_slave_1
agent_slave_1.channels.fileChannel1_1.type = file 
# on Ubuntu FS
agent_slave_1.channels.fileChannel1_1.capacity = 200000
agent_slave_1.channels.fileChannel1_1.transactionCapacity = 1000
# Define a source for agent_slave_1
agent_slave_1.sources.source1_1.type = spooldir

# on Ubuntu FS
#Spooldir in my case is /home/hduser/Desktop/flume_test_dir
agent_slave_1.sources.source1_1.spoolDir = /home/hduser/Desktop/flume_test_dir/
agent_slave_1.sources.source1_1.fileHeader = false
agent_slave_1.sources.source1_1.fileSuffix = .COMPLETED
agent_slave_1.sinks.hdfs-sink1_1.type = hdfs

#Sink is /user/hduser/flume/spool5/ under hdfs
agent_slave_1.sinks.hdfs-sink1_1.hdfs.path = hdfs://localhost:54310//user/hduser/flume/spool5/
agent_slave_1.sinks.hdfs-sink1_1.hdfs.batchSize = 1000
agent_slave_1.sinks.hdfs-sink1_1.hdfs.rollSize = 268435456
agent_slave_1.sinks.hdfs-sink1_1.hdfs.rollInterval = 0
agent_slave_1.sinks.hdfs-sink1_1.hdfs.rollCount = 50000000
agent_slave_1.sinks.hdfs-sink1_1.hdfs.writeFormat=Text

agent_slave_1.sinks.hdfs-sink1_1.hdfs.fileType = DataStream
agent_slave_1.sources.source1_1.channels = fileChannel1_1
agent_slave_1.sinks.hdfs-sink1_1.channel = fileChannel1_1

agent_slave_1.sinks =  hdfs-sink1_1
agent_slave_1.sources = source1_1
agent_slave_1.channels = fileChannel1_1

0 个答案:

没有答案