一对多KStream-KTable加入

时间:2018-03-13 15:03:21

标签: apache-kafka apache-kafka-streams

我有大学的kStream - 当大学是 -

University(universityId: String, name: String, studentIds: Seq[String])

val universityKStream = builder.stream[String, University](...)

和一个学生, 当学生是 -

Student(studentId: String, name: String)

val studentsKtable = builder.table[String, Student](...)

我想加入这两个并生成ResolvedUniversity对象的主题:

ResolvedUniversity(universityId: String, name: String, students: Seq[Student])

我不能groupAy和学生一起使用universityId,因为studentId字段在Student对象中不存在..

1 个答案:

答案 0 :(得分:3)

只使用DSL,我认为你能做的最简单的是(Java):

class Student {
    String studentId;
    String name;
}
class University {
    String universityId;
    String name;
    List<String> studentIds;
}
class ResolvedUniversity {
    String universityId;
    String name;
    List<Student> students;
}
Serde<String> stringSerde = null;
Serde<Student> studentSerde = null;
Serde<University> universitySerde = null;
Serde<ResolvedUniversity> resolvedUniversitySerde = null;

KStream<String, University> universities = topology
  .stream("universities", Consumed.with(stringSerde, universitySerde));

KTable<String, Student> students = topology
  .table("students", Consumed.with(stringSerde, studentSerde));

KTable<String, ResolvedUniversity> resolvedUniversities = universities
  .flatMap((k, v) -> {
      return v.studentIds.stream()
        .map(id -> new KeyValue<>(id, v))
        .collect(Collectors.toList());
  })
  .join(students, Pair::pair, Joined.with(stringSerde, universitySerde, studentSerde))
  .groupBy((k, v) -> v.left().universityId)
  .aggregate(ResolvedUniversity::new,
             (k, v, a) -> {
                 a.universityId = v.left().universityId;
                 a.name = v.left().name;
                 a.students.add(v.right());
                 return a;
             },
             Materialized.with(stringSerde, resolvedUniversitySerde));

使用此类联接,对于历史记录处理,KTable大学必须在KStream加入之前对其数据进行“准备”。