Question

我是Flume-Ng的新手，需要帮助来拖尾文件。我有一个运行hadoop的集群，远程运行flume。我使用putty与这个集群进行通信。我想在我的电脑上拖尾文件并将其放在集群中的HDFS上。我正在使用以下代码。

#flume.conf: http source, hdfs sink
# Name the components on this agent 

tier1.sources = r1
tier1.sinks = k1
tier1.channels = c1


# Describe/configure the source
tier1.sources.r1.type = exec
tier1.sources.r1.command = tail -F /(Path to file on my PC)


# Describe the sink
tier1.sinks.k1.type = hdfs
tier1.sinks.k1.hdfs.path = /user/ntimbadi/flume/
tier1.sinks.k1.hdfs.filePrefix = events-
tier1.sinks.k1.hdfs.round = true
tier1.sinks.k1.hdfs.roundValue = 10
tier1.sinks.k1.hdfs.roundUnit = minute



 # Use a channel which buffers events in memory
 tier1.channels.c1.type = memory
 tier1.channels.c1.capacity = 1000
 tier1.channels.c1.transactionCapacity = 100


 # Bind the source and sink to the channel
 tier1.sources.r1.channels = c1
 tier1.sinks.k1.channel = c1

我认为错误在源头。此类源不会使用主机名或i.p查找（在这种情况下应该是我的PC）。有人可以给我一个提示，告诉我如何在PC上拖尾文件，使用水槽将文件上传到远程HDFS。

Answer 1

配置中的exec来源将在启动水槽tier1代理的计算机上运行。如果您想从另一台机器收集数据，您还需要在该机器上启动水槽代理;总结你需要：

在具有remote1源的远程计算机上运行的代理（avro），它将侦听来自收集器代理的事件，并将像聚合器一样工作。
在您的计算机上运行的代理（local1）（充当收集器），具有exec源并通过{{1将数据发送到远程代理沉没。

或者，您只能在本地计算机上运行一个水槽代理程序（具有相同的配置）并将hdfs路径设置为“hdfs：// REMOTE_IP / hdfs / path”（尽管我并非完全确定这会奏效。）

编辑：以下是2代理方案的示例配置（如果没有一些修改，它们可能无法工作）。

avro

和

remote1.channels.mem-ch-1.type = memory

remote1.sources.avro-src-1.channels = mem-ch-1
remote1.sources.avro-src-1.type = avro
remote1.sources.avro-src-1.port = 10060
remote1.sources.avro-src-1.bind = 10.88.66.4 /* REPLACE WITH YOUR MACHINE'S EXTERNAL IP */

remote1.sinks.k1.channel = mem-ch-1
remote1.sinks.k1.type = hdfs
remote1.sinks.k1.hdfs.path = /user/ntimbadi/flume/
remote1.sinks.k1.hdfs.filePrefix = events-
remote1.sinks.k1.hdfs.round = true
remote1.sinks.k1.hdfs.roundValue = 10
remote1.sinks.k1.hdfs.roundUnit = minute

remote1.sources = avro-src-1
remote1.sinks = k1
remote1.channels = mem-ch-1

Flume Tail a File

1 个答案: