I need some help with flume configuration. I have a CentOS server where Django logs are generated over there. I want to read those logs and transfer it to another server. I have 2 configurations available one with one server and another with another server. The issue i am getting is i have used exec command tail -f in config. It is successfully transferring logs from CentOSto Hadoop server.The issue i am facing is the log generating in CentOS is single but when its been transferred to Hadoop Sever its copying twice in it. Can any one help me in this issue. What is the wrong i am doing. Thank you
Config in CentOS Server:
# configure the agent
agent.sources = r1
agent.channels = k1
agent.sinks = c1
# using memory channel to hold upto 1000 events
agent.channels.k1.type = memory
agent.channels.k1.capacity = 1000
agent.channels.k1.transactionCapacity = 100
# connect source, channel, sink
agent.sources.r1.channels = k1
agent.sinks.c1.channel = k1
# cat the file
agent.sources.r1.type = exec
agent.sources.r1.command = tail -f /home/bluedata/mysite/log/debug.log
# connect to another box using AVRO and send the data
agent.sinks.c1.type = avro
agent.sinks.c1.hostname = x.x.x.x
#NOTE: use your server 2s ip address here
agent.sinks.c1.port = 9049
#NOTE: This port should be open on Server 2
Config in Hadoop Server:
# THIS ONE WRITES TO A FILE
# configure the agent
agent.sources = r1
agent.channels = k1
agent.sinks = c1
# using memory channel to hold upto 1000 events
agent.channels.k1.type = memory
agent.channels.k1.capacity = 1000
agent.channels.k1.transactionCapacity = 100
# connect source, channel, sink
agent.sources.r1.channels = k1
agent.sinks.c1.channel = k1
# here source is listening at the specified port using AVRO for data
agent.sources.r1.type = avro
agent.sources.r1.bind = 0.0.0.0
agent.sources.r1.port = 9049
# this is what’s different.
# We use file_roll and write file at specified directory.
agent.sinks.c1.type = file_roll
agent.sinks.c1.sink.directory = /bdaas/debug
#Note: Change this path to your server path
Logs in CentOS Server:
"GET / HTTP/1.1" 200 5533
"GET /contact/ HTTP/1.1" 200 1833
"GET /blog/ HTTP/1.1" 200 1909
Logs in Hadoop Server: Repeating each one twice.
"GET / HTTP/1.1" 200 5533
"GET / HTTP/1.1" 200 5533
"GET /contact/ HTTP/1.1" 200 1833
"GET /contact/ HTTP/1.1" 200 1833
"GET /blog/ HTTP/1.1" 200 1909
"GET /blog/ HTTP/1.1" 200 1909