Question

我有一些批量为5的元组，其中包含来自用户的展示次数：

Batch 1:
[UUID1, clientId1]
[UUID2, clientId1]
[UUID2, clientId1]
[UUID2, clientId1]
[UUID3, clientId2]

Batch 2:
[UUID4, clientId1]
[UUID5, clientId1]
[UUID5, clientId1]
[UUID6, clientId2]
[UUID6, clientId2]

这是我保存计数状态的例子：

TridentState ClientState = impressionStream
    .groupBy(new Fields("clientId"))
    .persistentAggregate(getCassandraStateFactory("users", "DataComputation",
        "UserImpressionCounter"), new Count(), new Fields("count));

Stream ClientStream = ClientState.newValuesStream();

我有清晰的数据库并运行我的拓扑。在通过clientId对流进行分组后，我使用persistentAggregate函数和Count聚合器保存状态。对于第一批是newValuesStream方法之后的结果：[clientId1, 4]，[clientId2, 1]。对于第二批：[clientId1, 7]，[clientId2, 3]按预期方式。

ClientStream用于几个分支和一个分支这些分支我需要处理元组，以便批量为1，因为我需要有关每个的计数信息元组。大小为1的批处理显然是垃圾，所以在更新它并发出之前，我必须以某种方式找出计数器的先前状态这个信息与元组有已更新的计数器，例如第二批[clientId1, 7, 4]。

有人知道怎么做吗？

Answer 1

我已经通过添加新的聚合器并使用持久聚合连接解决了这个问题：

TridentState ClientState = impressionStream
    .groupBy(new Fields("clientId"))
    .persistentAggregate(getCassandraStateFactory("users", "DataComputation",
        "UserImpressionCounter"), new Count(), new Fields("count));

Stream ClientBatchAggregationStream = impressionStream
    .groupBy(new Fields("clientId"))
    .aggregate(new SumCountAggregator(), new Fields("batchCount"));

Stream GroupingPeriodCounterStateStream = topology
    .join(ClientState.newValuesStream(), new Fields("clientId"),
        ClientBatchAggregationStream, new Fields("clientId"), 
        new Fields("clientId", "count", "batchCount"));

SumCountAggregator：

public class SumCountAggregator extends BaseAggregator<SumCountAggregator.CountState> {

    static class CountState {
        long count = 0;
    }

    @Override
    public CountState init(Object batchId, TridentCollector collector) {
        return new CountState();
    }

    @Override
    public void aggregate(CountState state, TridentTuple tuple, TridentCollector collector)            {
        state.count += 1;
    }

    @Override
    public void complete(CountState state, TridentCollector collector) {
        collector.emit(new Values(state.count));
    }

}

如何在计数器更新之前获得先前的状态

1 个答案: