Question

在这种情况下，我需要用其他数据丰富多个（15）字段。

数据看起来像这样

class Company
    String abbreviation "abc"
    String fullName "<to be enriched by looking up via abbreviation>"
    List<Emloyee> employees;
    List<Department> departments;

class Employee
    String surname;
    String lastname;
    int someKey
    String someKeyEnriched "<to be enriched by looking up via someKey>"
    int secondKey
    String secondKeyEnriched "<to be enriched by looking up via secondKey>"

class Department
    int anotherKey
    String anotherKeyEnriched "<to be enriched by looking up via someKey>"

我们使用Kafka流实现了一项服务。它读取公司主题中的条目，并丰富其他主题中的数据。

其他扩展主题包含条目，这些条目带有Avro消息作为值，例如code =“ String”，name =“ String”和其他字段。例如：“ abc”，“公司abc的全名”。有15个丰富的主题，并且每隔几个月就会通过另一个过程进行更新。数据足够小，可以保存在内存中（> 1 mio）。

我们发现了similar question，是将这些扩展键值保存在纯地图中，还是将其与流结合在一起-然后通过使用地图来寻求便宜的解决方案。

代码非常简单，数据充实看起来像：

Topology.java ...

    final var stream = streamsBuilder.stream("NotYetEnrichedCompany", Consumed.with(Serdes.String(), new CompanySerde()));
    Produced<String, Company> companyStream = KafkaStreamsConfig.createCompanyProduced();
    stream
      .mapValues(companyEnricher::enrich)
      .to("EnrichedCompany", companyStream);


public class CompanyEnricher implements KeyValueEnricher<Company> {

  @Autowired
  private EnrichmentService enrichmentService;

  @Override
  public Company enrich(final Company company) {
    enrichAbbreviation(company);
    enrichEmployees(company);
    enrichDepartments(company);
    return company;
  }

  private void enrichEmployees(final Company company) {
    for (final var employee : company.getEmployees()) {
      enrichSomeKey(employee);
      enrichSecondKey(employee);
    }
  }

  private void enrichSomeKey(final Employee employee) {
    final var key = employee.getSomeKey();
    enrichmentService.get("SomeKeyTopic", code).ifPresent(employee::setSomeKeyEnriched);
  }

...

}

我们很高兴-直到我们意识到，数据丰富不适用于最早的公司条目。这是在开始读取扩充数据之前，在处理公司条目时发生的。上面的堆栈溢出问题中也提到了这一点。

然后，我们想使用kafka流连接方法重写它，以确保在充实数据时始终使用最实际的数据。代码变得复杂：

Topology.java ...

companyAbbreviationTable = streamsBuilder.globalTable("company.abbreviation", consumed);
employeeAbbreviationTable = streamsBuilder.globalTable("employee.abbreviation", consumed);
...

    final var stream = streamsBuilder.stream("NotYetEnrichedCompany", Consumed.with(Serdes.String(), new CompanySerde()));
    Produced<String, Company> companyStream = KafkaStreamsConfig.createCompanyProduced();
    stream
      .leftJoin(companyAbbreviationTable,
        (companyId, company) -> company.getAbbreviation(),
        (company, fullName) -> {
          if (fullName != null) {
            company.setMachineTypeMds(name.get("name").toString());
          }
          return company;
        })
      .flatMapValues(Company::getEmployees)
      .leftJoin(employeeAbbreviationTable,
        (employeeId, employee) -> employee.getSomeKey(),
        (employee, fullName) -> {
          if (fullName != null) {
            employee.setSomeKeyEnriched(name.get("name").toString());
          }
          return employee;
        })
    // Question 1: How can I get the surrounding company object again to persist it in the "EnrichedCompany" stream?
      .to("EnrichedCompany", companyStream);

如果我必须用join替换15个值，那么代码会变得很大，并且其可证明性也很难。

问题1：：在遍历员工之后如何恢复最初的公司实例： .flatMapValues(Company::getEmployees)流式传输丰富的公司实例.to("EnrichedCompany", companyStream);

问题2： 我还发现了卡夫卡斯interactive queries

是否可以通过交互式查询在CompanyEnricher类中引用15 ReadOnlyKeyValueStore并再次使用初始解决方案？

    stream
      .mapValues(companyEnricher::enrich)
      .to("EnrichedCompany", companyStream);

交互式查询是否可以解决我最初遇到的问题，以便利用最实际的数据来丰富公司？

问题3： 在我的情况下，是否有更好的数据丰富模式？

Kafka Streams：如何使用一对多关系丰富嵌套对象中的多个字段

0 个答案: