根据多个来源,例如Hadoop Application Architecture,多个接收器可以从同一个通道读取以提高吞吐量:
def clean_docfile(self):
print('test')
raise forms.ValidationError("Please upload a .gif file")
但除此之外,还有一个使用load balancing sink processor的接收组概念。根据{{3}},不需要创建接收器组来更快地消耗事件:
A sink can only fetch data from a single channel, but many sinks can fetch data from that same channel. A sink runs in a single thread, which has huge limitations on a single sink—for example, throughput to disk. Assume with HDFS you get 30 MBps to a single disk; if you only have one sink writing to HDFS then all you’re going to get is 30 MBps throughput with that sink. More sinks consuming from the same channel will resolve this bottleneck. The limitation with more sinks should be the network or the CPU. Unless you have a really small cluster, HDFS should never be your bottleneck.
因此,我真的不明白何时应该使用带有负载均衡器的组接收器,以及只添加从一个特定通道读取的更多接收器。
答案 0 :(得分:1)
多个接收器可以从同一个通道读取,但重要的是要记住,Flume只能保证每个事件都会被推入至少一个接收器,而不是每个连接的接收器。这些接收器的处理速度是不同的,并且对于哪个接收器将推动事件是不可预测的。 如果您需要多个接收器从同一通道读取,请始终使用故障转移或负载均衡接收器处理器。