Question

我在一个主题中有数据，需要对它进行多级计数，并且所有代码和文章都只提及字数统计示例。

数据的示例为：

序列：123 国家：我们日期：01/05/2018 州：纽约城市：纽约访客：5

序列：123 国家：我们日期：01/06/2018 州：纽约城市：皇后区访客：10

序列号：456 日期：01/06/2018 国家：我们州：纽约城市：皇后区访客：27

序列：123 日期：01/06/2018 国家：我们州：纽约城市：纽约参观者：867

我已经完成过滤，groupBy，但是合计？对不起Java 8和＆mix，我更喜欢8，但要同时学习

KTable<String, CountryVisitorModel> countryStream1 = inStream
    .filter((key, value) -> value.status.equalsIgnoreCase("TEST_DATA"))
    .groupBy((key, value) -> value.serial)
    .aggregate(
            new Initializer<CountryVisitorModel>() {

            public CountryVisitorModelapply() {
                return new CountryVisitorModel();
            }
        },
        new Aggregator<String, InputModel, CountryVisitorModel>() {

            @Override
            public CountryVisitorModelapply(String key, InputModel value, CountryVisitorModel aggregate) {

    aggregate.serial = value.serial;
    aggregate.country_name = value.country_name;
    aggregate.city_name = value.city_name;

    aggregate.country_count++;
    aggregate.city_count++;
    aggregate.ip_count++;

        //
    return aggregate;
       }
},
Materialized.with(stringSerde, visitorSerde));

对于所有相等的serial_id（这是分组依据）以此计算访客总数：

串行国家/地区的州城市total_num_visitors

Answer 1

如果每条记录仅占一个计数，我建议您branch()流和每个子流计数：

KStream stream = builder.stream(...)
KStream[] subStreams = stream.branch(...);

// each record of `stream` will be contained in exactly _one_ `substream`
subStream[0].grouByKey().count(); // or aggregate() instead of count()
subStream[1].grouByKey().count();
// ...

如果分支不起作用，因为单个记录需要进行多次计数，则可以“广播”并过滤：

KStream stream = builder.stream(...)

// each record in `stream` will be "duplicated" and sent to all `filters`
stream.filter(...).grouByKey().count(); // or aggregate() instead of count()
stream.filter(...).grouByKey().count();
// ...

多次使用同一个KStream对象并应用多个运算符（在我们的情况下，filter()，每条记录都将“广播”给所有运算符）。请注意，在这种情况下，不会实际复制记录（即对象），但是使用相同的输入记录对象来调用每个filter()。

复杂的聚合

1 个答案: