我有一个简单的有状态流处理,如此处Reciving Asynchronous Exception in Flink 1.7.2 Stateful-processing with KeyedProcessFunction and RocksDB state-backend所示。但是在processElement
函数中,它涉及一些缓慢的操作(〜150 ms)。它读取和写入Kafka主题。输入主题中有大约7000条消息,应用程序从头开始读取消息(模拟运行时延迟情况)。有一个Kafka使用者,其isolation-level
设置为read_committed
并连接到输出主题。消费者以非常缓慢且随机的方式(8-15分钟)接收已处理的消息-有时在处理完最后一条消息后,会接收全部7000条消息。
这是我的processElement
函数:
def processElement(value: String,
ctx: KeyedProcessFunction[String, String, (String, Long)]#Context,
out: Collector[(String, Long)]): Unit =
{
fileDownloader("https://my-domain.com/text/Sample-text-file-10kb.txt")
val tmpSum: Long = state.value
val currentSum =
if (tmpSum != null) tmpSum
else 0
val newSum = currentSum + 1
state.update(newSum)
out.collect((value, newSum))
log.info("collecting: " + (value, newSum))
}
...检查点配置:
env.enableCheckpointing(1000)
env.setStateBackend(new RocksDBStateBackend("file:///tmp/flink-data/rocksdb", true))
env.getCheckpointConfig.setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE)
...卡夫卡制片人:
val producerProps = new Properties
producerProps.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers)
producerProps.setProperty(ProducerConfig.RETRIES_CONFIG, "2147483647")
producerProps.setProperty(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, "1")
producerProps.setProperty(ProducerConfig.ACKS_CONFIG, "all")
producerProps.setProperty(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, "true")
val kafkaProducer = new FlinkKafkaProducer011[String](
topic,
new KeyedSerializationSchemaWrapper[String](new SimpleStringSchema),
producerProps,
Optional.of(new FlinkFixedPartitioner[String]),
FlinkKafkaProducer011.Semantic.EXACTLY_ONCE,
10
)
...修改了Kafka代理配置:
offsets.topic.replication.factor=1
offsets.topic.num.partitions=3
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
transaction.state.log.num.partitions=3
transaction.max.timeout.ms=3600000
transactional.id.expiration.ms=3600000
有没有我想设置的配置?还是启用了端到端精确一次语义的Flink流处理的预期行为?
非常感谢您。