我们计划用apache kafka建立一个实时监控系统。总体思路是将数据从多个数据源推送到kafka并执行数据质量检查。我对这个架构的问题很少
请告诉我您的专家意见。谢谢!
答案 0 :(得分:1)
我认为这里最好的方法是使用Kafka connect:link
但这是一种拉动方式:
Kafka Connect sources are pull-based for a few reasons. First, although connectors should generally run continuously, making them pull-based means that the connector/Kafka Connect decides when data is actually pulled, which allows for things like pausing connectors without losing data, brief periods of unavailability as connectors are moved, etc. Second, in distributed mode the tasks that pull data may need to be rebalanced across workers, which means they won't have a consistent location or address. While in standalone mode you could guarantee a fixed network endpoint to work with (and point other services at), this doesn't work in distributed mode where tasks can be moving around between workers.
Ewen