我需要使用flume从远程服务器摄取数据到hdfs ::
我使用source作为syslogtcp。我的flume.conf文件如下:
Agent.sources = syslog
Agent.channels = MemChannel
Agent.sinks = HDFS
Agent.sources.syslog.type = syslogtcp
Agent.sources.syslog.channels = MemChannel
Agent.sources.syslog.port = 5140
Agent.sources.syslog.host = localhost
Agent.sinks.HDFS.channel = MemChannel
Agent.sinks.HDFS.type = hdfs
Agent.sinks.HDFS.hdfs.path = hdfs://192.168.111.130:8022/user/cloudera/Twitter/apple_data/%y/%m/%d/
Agent.sinks.HDFS.hdfs.fileType = DataStream
Agent.sinks.HDFS.hdfs.writeFormat = Text
Agent.sinks.HDFS.hdfs.batchSize = 1000
Agent.sinks.HDFS.hdfs.rollSize = 0
Agent.sinks.HDFS.hdfs.rollCount = 10000
Agent.channels.MemChannel.type = memory
Agent.channels.MemChannel.capacity = 10000
Agent.channels.MemChannel.transactionCapacity = 100
我在/etc/rsyslog.d/B2B.conf中有一个文件
# rsyslog v5 configuration file
# For more information see /usr/share/doc/rsyslog-*/rsyslog_conf.html
# If you experience problems, see http://www.rsyslog.com/doc/troubleshoot.html
#### MODULES ####
#$ModLoad imuxsock # provides support for local system logging (e.g. via logger command)
#$ModLoad imklog # provides kernel logging support (previously done by rklogd)
$ModLoad imfile
$InputFileName /home/cloudera/Desktop/my_logs/my_log.txt
$InputFileTag tag1:
$InputFileStateFile stat-file1
$InputFileFacility local7
$InputRunFileMonitor
$InputFilePollingInterval 10
#$ModLoad immark # provides --MARK-- message capability
# Provides UDP syslog reception
#$ModLoad imudp
#$UDPServerRun 514
# Provides TCP syslog reception
#$ModLoad imtcp
#$InputTCPServerRun 514
#### GLOBAL DIRECTIVES ####
# Use default timestamp format
$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
# File syncing capability is disabled by default. This feature is usually not required,
# not useful and an extreme performance hit
#$ActionFileEnableSync on
# Include all config files in /etc/rsyslog.d/
$IncludeConfig /etc/rsyslog.d/*.conf
#### RULES ####
# Log all kernel messages to the console.
# Logging much else clutters up the screen.
#kern.* /dev/console
# Log anything (except mail) of level info or higher.
# Don't log private authentication messages!
*.info;mail.none;authpriv.none;cron.none /var/log/messages
# The authpriv file has restricted access.
authpriv.* /var/log/secure
# Log all the mail messages in one place.
mail.* -/var/log/maillog
# Log cron stuff
cron.* /var/log/cron
# Everybody gets emergency messages
*.emerg *
# Save news errors of level crit and higher in a special file.
uucp,news.crit /var/log/spooler
# Save boot messages also to boot.log
local7.* /var/log/boot.log
# ### begin forwarding rule ###
# The statement between the begin ... end define a SINGLE forwarding
# rule. They belong together, do NOT split them. If you create multiple
# forwarding rules, duplicate the whole block!
# Remote Logging (we use TCP for reliable delivery)
#
# An on-disk queue is created for this action. If the remote host is
# down, messages are spooled to disk and sent when it is up again.
$WorkDirectory /var/lib/rsyslog # where to place spool files
$ActionQueueFileName fwdRule1 # unique name prefix for spool files
$ActionQueueMaxDiskSpace 1g # 1gb space limit (use as much as possible)
$ActionQueueSaveOnShutdown on # save messages to disk on shutdown
$ActionQueueType LinkedList # run asynchronously
$ActionResumeRetryCount -1 # infinite retries if host is down
# remote host is: name/ip:port, e.g. 192.168.0.1:514, port optional
*.* @@192.168.111.130:514
# ### end of the forwarding rule ###
现在,当我运行java应用程序来创建log,flume和rsyslog时:
停在
关闭系统记录器: 启动系统记录器: