I'm trying to poll the file size of a temp. avro file that's being written to on HDFS from a Kafka Topic, but org.apache.hadoop.fs.FileStatus
keeps returning 0 bytes (.getLen()
), while the writer is still open and writing.
I could keep a counter of length at the writer end, but deep down the data is converted into a binary format (avro) that differs in length from the original record. It could be approximated, but I'm looking for an more precise solution.
Is there a way to get the size of a still open hdfs file from either the hdfs (io.confluent.connect.hdfs.storage.HdfsStorage
) perspective or the file writer (io.confluent.connect.storage.format.RecordWriter
) perspective?
答案 0 :(得分:0)
最后,我扩展了RecordWriter
中使用的AvroRecordWriterProvider
,并在FSDataOutputStream
周围添加了一个包装器,以查询TopicPartitionWriter
法律通过后,我会将代码推送到分叉并提供所有感兴趣者的链接。