How to transfer logs from centOS to Hadoop Server usign flume without repeating?

时间:2018-01-23 19:21:55

标签: django hadoop streaming flume flume-ng

I need some help with flume configuration. I have a CentOS server where Django logs are generated over there. I want to read those logs and transfer it to another server. I have 2 configurations available one with one server and another with another server. The issue i am getting is i have used exec command tail -f in config. It is successfully transferring logs from CentOSto Hadoop server.The issue i am facing is the log generating in CentOS is single but when its been transferred to Hadoop Sever its copying twice in it. Can any one help me in this issue. What is the wrong i am doing. Thank you

Config in CentOS Server:

# configure the agent
    agent.sources = r1
    agent.channels = k1
    agent.sinks = c1

# using memory channel to hold upto 1000 events
    agent.channels.k1.type = memory
    agent.channels.k1.capacity = 1000
    agent.channels.k1.transactionCapacity = 100

# connect source, channel, sink
    agent.sources.r1.channels = k1
    agent.sinks.c1.channel = k1

# cat the file
    agent.sources.r1.type = exec
    agent.sources.r1.command = tail -f  /home/bluedata/mysite/log/debug.log

# connect to another box using AVRO and send the data
    agent.sinks.c1.type = avro
    agent.sinks.c1.hostname = x.x.x.x
                        #NOTE: use your server 2s ip address here
    agent.sinks.c1.port = 9049
                        #NOTE: This port should be open on Server 2

Config in Hadoop Server:

# THIS ONE WRITES TO A FILE
# configure the agent
   agent.sources = r1
   agent.channels = k1
   agent.sinks = c1

# using memory channel to hold upto 1000 events
   agent.channels.k1.type = memory
   agent.channels.k1.capacity = 1000
   agent.channels.k1.transactionCapacity = 100

# connect source, channel, sink
   agent.sources.r1.channels = k1
   agent.sinks.c1.channel = k1

# here source is listening at the specified port using AVRO for data
   agent.sources.r1.type = avro
   agent.sources.r1.bind = 0.0.0.0
   agent.sources.r1.port = 9049

# this is what’s different.
# We use file_roll and write file at specified directory.
   agent.sinks.c1.type = file_roll
   agent.sinks.c1.sink.directory = /bdaas/debug
                                #Note: Change this path to your server path

Logs in CentOS Server:

"GET / HTTP/1.1" 200 5533 
"GET /contact/ HTTP/1.1" 200 1833    
"GET /blog/ HTTP/1.1" 200 1909

Logs in Hadoop Server: Repeating each one twice.

"GET / HTTP/1.1" 200 5533
"GET / HTTP/1.1" 200 5533
"GET /contact/ HTTP/1.1" 200 1833
"GET /contact/ HTTP/1.1" 200 1833
"GET /blog/ HTTP/1.1" 200 1909
"GET /blog/ HTTP/1.1" 200 1909

0 个答案:

没有答案