我正在使用flume写入Google云端存储。 Flume听HTTP:9000
。我花了一些时间才能使它工作(添加gcs库,使用凭证文件......)但现在它似乎通过网络进行通信。
我正在为我的测试发送非常小的HTTP请求,并且我有足够的RAM可用:
curl -X POST -d '[{ "headers" : { timestamp=1417444588182, env=dev, tenant=myTenant, type=myType }, "body" : "some body ONE" }]' localhost:9000
我在第一次请求时遇到此内存异常(当然,它停止工作):
2014-11-28 16:59:47,748 (hdfs-hdfs_sink-call-runner-0) [INFO - com.google.cloud.hadoop.util.LogUtil.info(LogUtil.java:142)] GHFS version: 1.3.0-hadoop2
2014-11-28 16:59:50,014 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:467)] process failed
java.lang.OutOfMemoryError: Java heap space
at java.io.BufferedOutputStream.<init>(BufferedOutputStream.java:76)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.<init>(GoogleHadoopOutputStream.java:79)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.create(GoogleHadoopFileSystemBase.java:820)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
(有关详细信息,请参阅complete stack trace as a gist)
奇怪的是,文件夹和文件是按照我想要的方式创建的,但文件是空的。
gs://my_bucket/dev/myTenant/myType/2014-12-01/14-36-28.1417445234193.json.tmp
我配置flume + GCS的方式有问题,还是GCS.jar中的错误?
我应该在哪里检查以收集更多数据?
ps:我在docker中运行flume-ng。
我的flume.conf
文件:
# Name the components on this agent
a1.sources = http
a1.sinks = hdfs_sink
a1.channels = mem
# Describe/configure the source
a1.sources.http.type = org.apache.flume.source.http.HTTPSource
a1.sources.http.port = 9000
# Describe the sink
a1.sinks.hdfs_sink.type = hdfs
a1.sinks.hdfs_sink.hdfs.path = gs://my_bucket/%{env}/%{tenant}/%{type}/%Y-%m-%d
a1.sinks.hdfs_sink.hdfs.filePrefix = %H-%M-%S
a1.sinks.hdfs_sink.hdfs.fileSuffix = .json
a1.sinks.hdfs_sink.hdfs.round = true
a1.sinks.hdfs_sink.hdfs.roundValue = 10
a1.sinks.hdfs_sink.hdfs.roundUnit = minute
# Use a channel which buffers events in memory
a1.channels.mem.type = memory
a1.channels.mem.capacity = 10000
a1.channels.mem.transactionCapacity = 1000
# Bind the source and sink to the channel
a1.sources.http.channels = mem
a1.sinks.hdfs_sink.channel = mem
我的水道/ gcs之旅中的相关问题:What is the minimal setup needed to write to HDFS/GS on Google Cloud Storage with flume?
答案 0 :(得分:2)
上传文件时,GCS Hadoop FileSystem实现为每个FSDataOutputStream(文件打开以进行写入)留出了相当大的(64MB)写缓冲区。这可以通过在core-site.xml中将"fs.gs.io.buffersize.write"设置为较小的值(以字节为单位)来更改。我想1MB就足以进行低容量的日志收集。
此外,检查启动JVM for flume时设置的最大堆大小。 flume-ng脚本将默认的JAVA_OPTS值设置为-Xmx20m,以将堆限制为20MB。这可以在flume-env.sh中设置为更大的值(有关详细信息,请参阅flume tarball发行版中的conf / flume-env.sh.template)。