要存储已处理的记录,我在Storm拓扑中使用HiveBolt,并带有以下参数。
- id: "MyHiveOptions"
className: "org.apache.storm.hive.common.HiveOptions"
- "${metastore.uri}" # metaStoreURI
- "${hive.database}" # databaseName
- "${hive.table}" # tableName
configMethods:
- name: "withTxnsPerBatch"
args:
- 2
- name: "withBatchSize"
args:
- 100
- name: "withIdleTimeout"
args:
- 2 #default value 0
- name: "withMaxOpenConnections"
args:
- 200 #default value 500
- name: "withCallTimeout"
args:
- 30000 #default value 10000
- name: "withHeartBeatInterval"
args:
- 240 #default value 240
由于批次未完成且记录被刷新,Hive中缺少事务。 (例如:处理了1330条记录,但只有1200条记录在蜂巢中。缺少130条记录。)
我怎样才能克服这种情况?如何填充批处理以便触发事务并将记录存储在配置单元中。
Topology : Kafka-Spout --> DataProcessingBolt
DataProcessingBolt -->HiveBolt (Sink)
DataProcessingBolt -->JdbcBolt (Sink)