我正在使用来自Kafka的Flume for HiveSink 当我使用HDFSSink时,它存储数据的速度足够快,但是当我将接收器更改为HiveSink之后,存储在配置单元仓库中变得太慢了,
这可能是我的源数据很大(每天3T)的原因之一,但是我不知道为什么HiveSink太慢, 它每天仅将数据存储在Hive仓库中0.3T。 当我使用HDFS Sink时,它将全部存储3T。 我想将数据快速存储在Hive仓库中。
,并且通道数据目录中仍有大量等待数据。 所以我认为这是HiveSink的问题。
有什么想法可以使其更快吗? 我必须更改哪个配置?
//////////////////////
这是我的水槽
tier1.sinks.sink_flume_hive.type = hive
tier1.sinks.sink_flume_hive.channel = channel_flume_hive
tier1.sinks.sink_flume_hive.hive.metastore = thrift://
tier1.sinks.sink_flume_hive.hive.database = test
tier1.sinks.sink_flume_hive.hive.table = data_flume
tier1.sinks.sink_flume_hive.maxOpenConnections = 3000
tier1.sinks.sink_flume_hive.batchSize = 30000
tier1.sinks.sink_flume_hive.hive.txnsPerBatchAsk = 10000
tier1.sinks.sink_flume_hive.hive.partition = %Y%m%d,%H
tier1.sinks.sink_flume_hive.useLocalTimeStamp = false
tier1.sinks.sink_flume_hive.round = true
tier1.sinks.sink_flume_hive.roundValue = 3
tier1.sinks.sink_flume_hive.roundUnit = minute
tier1.sinks.sink_flume_hive.serializer = DELIMITED
tier1.sinks.sink_flume_hive.serializer.delimiter = "\t"
tier1.sinks.sink_flume_hive.serializer.serdeSeparator = '\t'