风暴的HdfsBolt也能在超时后刷新数据吗?

时间:2015-10-07 21:10:39

标签: bigdata apache-storm

我们正在使用Storm来处理流数据并存储到HDFS中。我们已经完成了所有工作,但有一个问题。我知道我们可以指定使用SyncPolicy将数据刷新到HDFS之后的元组数,如下所示:

SyncPolicy syncPolicy = new CountSyncPolicy(Integer.parseInt(args [3]));

我的问题是,超时后数据是否也可以刷新?对于例如我们已将SyncPolicy设置为1000元组以上。如果由于某种原因我们得到995个元组然后数据停止进入一段时间是否有任何方式风暴可以在指定的超时(5秒)后将995记录刷新到HDFS?

提前感谢您提供任何帮助!

  • 谢伊

2 个答案:

答案 0 :(得分:1)

是的,如果你向HDFS螺栓发送一个tick元组,它将导致bolt尝试同步到HDFS文件系统。所有这些都发生在HDFS bolt's execute function

在拓扑配置中为拓扑配置tick元组。在Java中,要将其设置为每300秒,代码将如下所示:

Config topologyConfig = new Config();
topologyConfig.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, 300);

StormSubmitter.submitTopology("mytopology", topologyConfig, builder.createTopology());

您必须根据具体情况调整最后一行。

答案 1 :(得分:1)

此问题有另一种解决方案,

首先,让我们澄清同步策略,如果您的同步策略是1000,那么HdfsBolt只通过调用execute()中的hsync()方法来同步来自1000元组的数据,这意味着它只会通过将数据推送到磁盘来清除缓冲区,但是对于更快的写入磁盘,可以使用其缓存而不是直接写入文件。

仅当数据大小与您在创建螺栓时需要指定的轮换策略相匹配时,才会将数据写入文件。

 FileRotationPolicy rotationPolicy = new FileSizeRotationPolicy(100.0f, Units.KB);

因此,为了在超时后将记录刷新到文件,在excecute方法中从正常元组中分离你的tick元组并计算两个元组的时间差,如果diff大于timeout time,则将数据写入文件。

通过以不同方式处理tick元组,您还可以避免写入文件的tick元组频率。

请参阅以下代码以便更好地理解:

public class CustomHdfsBolt1 extends AbstractHdfsBolt {

private static final Logger LOG = LoggerFactory.getLogger(CustomHdfsBolt1.class);
private transient FSDataOutputStream out;
private RecordFormat format;
private long offset = 0L;
private int tickTupleCount = 0;
private String type;
private long normalTupleTime;
private long tickTupleTime;

public CustomHdfsBolt1() {

}


public CustomHdfsBolt1(String type) {
    this.type = type;
}

public CustomHdfsBolt1 withFsUrl(String fsUrl) {
    this.fsUrl = fsUrl;
    return this;
}

public CustomHdfsBolt1 withConfigKey(String configKey) {
    this.configKey = configKey;
    return this;
}

public CustomHdfsBolt1 withFileNameFormat(FileNameFormat fileNameFormat) {
    this.fileNameFormat = fileNameFormat;
    return this;
}

public CustomHdfsBolt1 withRecordFormat(RecordFormat format) {
    this.format = format;
    return this;
}

public CustomHdfsBolt1 withSyncPolicy(SyncPolicy syncPolicy) {
    this.syncPolicy = syncPolicy;
    return this;
}

public CustomHdfsBolt1 withRotationPolicy(FileRotationPolicy rotationPolicy) {
    this.rotationPolicy = rotationPolicy;
    return this;
}

public CustomHdfsBolt1 addRotationAction(RotationAction action) {
    this.rotationActions.add(action);
    return this;
}

protected static boolean isTickTuple(Tuple tuple) {
    return tuple.getSourceComponent().equals(Constants.SYSTEM_COMPONENT_ID)
            && tuple.getSourceStreamId().equals(Constants.SYSTEM_TICK_STREAM_ID);
}


public void execute(Tuple tuple) {
    try {

        if (isTickTuple(tuple)) {
            tickTupleTime = Calendar.getInstance().getTimeInMillis();

            long timeDiff = normalTupleTime - tickTupleTime;

            long diffInSeconds = TimeUnit.MILLISECONDS.toSeconds(timeDiff);

            if (diffInSeconds > 5) {  // specify the value you want.
                this.rotateWithOutFileSize(tuple);

            }

        } else {

            normalTupleTime = Calendar.getInstance().getTimeInMillis();
            this.rotateWithFileSize(tuple);
        }
    } catch (IOException var6) {
        LOG.warn("write/sync failed.", var6);
        this.collector.fail(tuple);
    }

}


public void rotateWithFileSize(Tuple tuple) throws IOException {

    syncHdfs(tuple);

    this.collector.ack(tuple);

    if (this.rotationPolicy.mark(tuple, this.offset)) {
        this.rotateOutputFile();
        this.offset = 0L;
        this.rotationPolicy.reset();
    }
}


public void rotateWithOutFileSize(Tuple tuple) throws IOException {

    syncHdfs(tuple);

    this.collector.ack(tuple);

    this.rotateOutputFile();
    this.offset = 0L;
    this.rotationPolicy.reset();

}

public void syncHdfs(Tuple tuple) throws IOException {
    byte[] e = this.format.format(tuple);

    synchronized (this.writeLock) {
        this.out.write(e);
        this.offset += (long) e.length;
        if (this.syncPolicy.mark(tuple, this.offset)) {
            if (this.out instanceof HdfsDataOutputStream) {
                ((HdfsDataOutputStream) this.out).hsync(EnumSet.of(SyncFlag.UPDATE_LENGTH));
            } else {
                this.out.hsync();
            }

            this.syncPolicy.reset();
        }
    }


}

public void closeOutputFile() throws IOException {
    this.out.close();
}

public void doPrepare(Map conf, TopologyContext topologyContext, OutputCollector collector) throws IOException {
    LOG.info("Preparing HDFS Bolt...");
    this.fs = FileSystem.get(URI.create(this.fsUrl), this.hdfsConfig);
    this.tickTupleCount = 0;
    this.normalTupleTime = 0;
    this.tickTupleTime = 0;

}

public Path createOutputFile() throws IOException {
    Path path = new Path(this.fileNameFormat.getPath(),
            this.fileNameFormat.getName((long) this.rotation, System.currentTimeMillis()));
    this.out = this.fs.create(path);
    return path;
}
}

您可以在项目中直接使用此类。

谢谢,