使用Storm和Redis作为数据源

时间:2016-05-04 22:19:10

标签: redis apache-storm jedis

我有一个需要从Redis实例流式传输数据的Storm拓扑,我尝试从单个Redis实例运行拓扑读取,但似乎没有从Redis读取任何内容,当我检查返回的队列时它是空的。我正在使用Storm版本0.9.3。

这是我的RedisQueueSpout,它是一个Storm spout,它会使用指定的模式(也称为密钥)将拓扑插入Redis,每次Storm轮询时它都会查找输入数据。喷口会发出一个带有ID消息的字段,以跟随它后面的任何螺栓。

package storm.starter.spout;

import java.util.List;
import java.util.Map;
import redis.clients.jedis.Jedis;
import backtype.storm.spout.SpoutOutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.base.BaseRichSpout;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Values;
import backtype.storm.utils.Utils;


public class RedisQueueSpout extends BaseRichSpout {
  static final long            serialVersionUID = 737015318988609460L;
  private SpoutOutputCollector _collector;
  private final String         host;
  private final int            port;
  private final String         pattern;
  private transient JedisQueue jq;

  public RedisQueueSpout(String host, int port, String pattern) {
    this.host = host;
    this.port = port;
    this.pattern = pattern;
  }

  @SuppressWarnings("rawtypes")
  public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
    _collector = collector;
    Jedis newJedis = new Jedis(host, port);
    newJedis.connect();
    this.jq = new JedisQueue(newJedis, pattern);
  }

  public void close() {}

  public void nextTuple() {
    List<String> ret = this.jq.dequeue();
    if (ret == null) {
      Utils.sleep(5L);
    }
    else {
      System.out.println(ret);
      _collector.emit(new Values(ret));
    }
  }

  @Override
  public void ack(Object msgId) {}

  @Override
  public void fail(Object msgId) {}

  public void declareOutputFields(OutputFieldsDeclarer declarer) {
    declarer.declare(new Fields("name"));
  }
}

这是我的JedisQueue,它是Redis支持的标准队列数据结构的实现。请注意,dequeue方法在某种程度上非常规地返回List<String>,因为这是基础Jedis实现返回的内容:这是由于Redis能够为单个密钥存储许多值。

package storm.starter;

import java.util.List;
import redis.clients.jedis.Jedis;
import redis.clients.jedis.exceptions.JedisDataException;

public class JedisQueue {
  private transient Jedis jedis;
  private final String pattern;

  public JedisQueue(Jedis jedis, String pattern) {
    this.jedis = jedis;
    this.pattern = pattern;
  }

  public void clear() {
    this.jedis.del(this.pattern);
  }

  public boolean isEmpty() { 
    return (this.size() == 0);
  }

  public int size() {
    return new Integer(this.jedis.llen(this.pattern).toString());
  }

  public List<String> toArray() {
    return this.jedis.lrange(this.pattern, 0, -1);
  }

  public void enqueue(String... elems) {
    this.jedis.rpush(this.pattern, elems);
  }

  public List<String> dequeue() {
    List<String> out = null;
    try {
      out = this.jedis.blpop(0, this.pattern);
    }
    catch (JedisDataException e) {
      // It wasn't a list of strings
    }

    return out;
  }
}

代码来自Storm-jedis,有关详细信息,请查看链接。

这是我的拓扑结构:

package storm.starter;

import org.tomdz.storm.esper.EsperBolt;
import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.StormSubmitter;
import backtype.storm.topology.TopologyBuilder;
import backtype.storm.tuple.Fields;
import backtype.storm.utils.Utils;
import storm.starter.spout.RedisQueueSpout;;

public class NameCountTopology {
  public static void main (String[] args) throws Exception {
    String host = "10.0.0.251";
    int port = 6379;
    String pattern = "Name:*";
    TopologyBuilder builder = new TopologyBuilder();

    EsperBolt bolt = new EsperBolt.Builder().inputs().aliasComponent("spout").toEventType("names").outputs()
            .onDefaultStream().emit("nps").statements()
            .add("select count(*) as nps from names.win:time_batch(1 sec)").build();

    builder.setSpout("spout", new RedisQueueSpout(host,port,pattern),1);
    builder.setBolt("count-bolt", bolt, 1).fieldsGrouping("spout", new Fields("name"));


    Config conf = new Config();
    conf.setDebug(true);


    if (args != null && args.length > 0) {
        conf.setNumWorkers(1);

        StormSubmitter.submitTopology(args[0], conf, builder.createTopology());

    } else {

        LocalCluster cluster = new LocalCluster();
        cluster.submitTopology("name-count-topology", conf, builder.createTopology());
        Utils.sleep(300000);
        cluster.killTopology("name-count-topology");
        cluster.shutdown();

    }
}

}

我的Redis键值使用HMSET以以下格式存储:

HMSET Name:1 NAME Mary YEAR 1880 GENDER F COUNT 7065
HMSET Name:2 NAME Anna YEAR 1880 GENDER F COUNT 2604
...

这是我的主管节点的日志:

2016-05-04 07:37:56 b.s.d.executor [INFO] Opened spout spout:(3) 
2016-05-04 07:37:56 b.s.d.executor [INFO] Activating spout spout:(3)
2016-05-04 07:37:56 STDIO [INFO] Queue is empty... 
2016-05-04 07:37:56 c.e.e.c.EPServiceProviderImpl [INFO] Initializing engine URI 'org.tomdz.storm.esper.EsperBolt@44d83ea0' version 4.3.0 
2016-05-04 07:37:58 b.s.d.executor [INFO] Prepared bolt count-bolt:(2)
2016-05-04 07:38:54 b.s.d.executor [INFO] Processing received message source: __system:-1, stream: __metrics_tick, id: {}, [60] 
2016-05-04 07:38:54 b.s.d.task [INFO] Emitting: __system __metrics [#<TaskInfo backtype.storm.metric.api.IMetricsConsumer$TaskInfo@70f9b3ee> [#<DataPoint [__ack-count = {}]> #<DataPoint [memory/heap = {unusedBytes=9418640, usedBytes=14710896, maxBytes=259522560, initBytes=8035520, virtualFreeBytes=244811664, committedBytes=24129536}]> #<DataPoint [__receive = {write_pos=1, read_pos=0, capacity=1024, population=1}]> #<DataPoint [__fail-count = {}]> #<DataPoint [__execute-latency = {}]> #<DataPoint [newWorkerEvent = 1]> #<DataPoint [__emit-count = {}]> #<DataPoint [__execute-count = {}]> #<DataPoint [__sendqueue = {write_pos=-1, read_pos=-1, capacity=1024, population=0}]> #<DataPoint [memory/nonHeap = {unusedBytes=1218808, usedBytes=36529928, maxBytes=224395264, initBytes=24313856, virtualFreeBytes=187865336, committedBytes=37748736}]> #<DataPoint [uptimeSecs = 77.358]> #<DataPoint [__transfer = {write_pos=0, read_pos=0, capacity=1024, population=0}]> #<DataPoint [startTimeSecs = 1.462347457159E9]> #<DataPoint [__process-latency = {}]> #<DataPoint [__transfer-count = {}]>]] 
2016-05-04 07:38:54 b.s.d.executor [INFO] Processing received message source: __system:-1, stream: __metrics_tick, id: {}, [60] 
2016-05-04 07:38:54 b.s.d.task [INFO] Emitting: __acker __metrics [#<TaskInfo backtype.storm.metric.api.IMetricsConsumer$TaskInfo@19940834> [#<DataPoint [__ack-count = {}]> #<DataPoint [__sendqueue = {write_pos=-1, read_pos=-1, capacity=1024, population=0}]> #<DataPoint [__receive = {write_pos=1, read_pos=0, capacity=1024, population=1}]> #<DataPoint [__process-latency = {}]> #<DataPoint [__transfer-count = {}]> #<DataPoint [__execute-latency = {}]> #<DataPoint [__fail-count = {}]> #<DataPoint [__emit-count = {}]> #<DataPoint [__execute-count = {}]>]]

并且日志不断重复。 这是运行拓扑后的UI: storm UI

现在我的问题是为什么鲸鱼喷水不起作用而且没有任何物质被释放出来,似乎没有任何东西从Redis中被捡起来。

PS:我已经检查了主机和端口,我可以从Redis获取数据,所以我认为与Redis的连接没有任何问题。

1 个答案:

答案 0 :(得分:1)

  1. HMSET用于散列,BLPOP用于列表。他们不兼容。
  2. BLPOP并不期待一种模式。它需要准确的密钥名称。有关详细信息,请参阅http://redis.io/commands/blpop
  3. 由于Spout从单个线程执行nextTuple(),ack(),fail()方法,因此具有长(或无限)超时的BLPOP也会阻止Spout,除非有一条消息可用于弹出。
  4. 希望它有所帮助。