Flume Avro可以下沉到Node.js服务器吗?

时间:2013-02-16 03:56:54

标签: node.js flume avro

首次Stack Overflow提问者在这里......将尝试尽可能多地包含详细信息。

我正在尝试通过Avro接收器将Apache Flume日志数据传输到Node.js服务器,监听特定端口。我打算使用Collective Media's node-avro library来帮助Avro的二进制格式和JSON之间的序列化,所以我可以处理Node.js中的数据(我通过socket.io pub / sub将它传递给客户端)

我很确定我已正确配置Flume,因为我看到数据流经通道并输出到控制台(仅用于调试,我也将数据下沉到控制台)。当我启用Avro接收器并调出侦听同一端口的Node.js服务器时,Flume在尝试进行Avro传输时抛出异常:

2013-02-15 22:06:09,858 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:160)] Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: Failed to send events
    at org.apache.flume.sink.AvroSink.process(AvroSink.java:325)
    at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
    at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
    at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient { host: localhost, port: 4242 }: Failed to send batch
    at org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:236)
    at org.apache.flume.sink.AvroSink.process(AvroSink.java:309)
    ... 3 more
Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient { host: localhost, port: 4242 }: Exception thrown from remote handler
    at org.apache.flume.api.NettyAvroRpcClient.waitForStatusOK(NettyAvroRpcClient.java:318)
    at org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:295)
    at org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:224)
    ... 4 more
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: NettyTransceiver closed
    at org.apache.avro.ipc.CallFuture.get(CallFuture.java:128)
    at org.apache.flume.api.NettyAvroRpcClient.waitForStatusOK(NettyAvroRpcClient.java:310)
    ... 6 more
Caused by: java.io.IOException: NettyTransceiver closed
    at org.apache.avro.ipc.NettyTransceiver.disconnect(NettyTransceiver.java:338)
    at org.apache.avro.ipc.NettyTransceiver.access$200(NettyTransceiver.java:59)
    at org.apache.avro.ipc.NettyTransceiver$NettyClientAvroHandler.handleUpstream(NettyTransceiver.java:496)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:792)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:348)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:236)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:93)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
    at org.jboss.netty.channel.Channels.fireChannelClosed(Channels.java:476)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:623)
    at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:101)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:364)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:238)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:38)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    ... 1 more
2013-02-15 22:06:14,895 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.AvroSink.createConnection(AvroSink.java:178)] Avro sink k1: Building RpcClient with hostname: 127.0.0.1, port: 4242

我不确定的是,如何判断我的Node.js服务是否至少得到了消息。我对Node.js很陌生,所以这没有帮助,但这里是设置监听器的代码片段:

var flumeSink = require('http').createServer(flumeHandler);
flumeSink.listen(8000);
function flumeHandler (req, res) {
    console.log("Got it!");
    //var schema = avro.prepareSchema("string");
    //var buffer = schema.encode("foo");
    //var value = schema.decode(buffer);
}

我在想我已经将Node.js设置错误了。我正在使用HTTP模块,它可能不是正确的模块。也许我需要考虑在Node.js中编写自定义接收器?指针/帮助赞赏!

1 个答案:

答案 0 :(得分:0)

在这种情况下,avro接收器可能不是您需要的,因为它设计用于将Flume与Flume通信(它是您构建Flume连接拓扑的方式)。

如果要创建不在标准列表中的接收器,则需要构建自定义接收器并使用https://flume.apache.org/FlumeUserGuide.html#custom-sink中定义的自定义配置 这是我尝试过的,它完美无缺。

或使用存在的东西:

https://github.com/josealvarezmuguerza/flume-http-sink

我从未使用过这个模块。刚刚通过谷歌搜索来了。

对于Avro部分,只需使用morphlines将源转换为avro,然后将每个事件发布到node.js服务器。

希望这会有所帮助。

去代码!