Hive中Parquet和Avro文件格式的架构演变的含义是什么

时间:2019-04-08 02:56:37

标签: hive

任何人都可以解释Hive中镶木地板和Avro文件格式的架构演变的含义。

1 个答案:

答案 0 :(得分:0)

Schema evolution is nothing but a term used for how to store the behaves when schema changes . Users can start with a simple schema, and gradually add more columns to the schema as needed. In this way, users may end up with multiple Parquet/Avro files with different but mutually compatible schemas.

so lets say if you have one avro/parquet file and you want to change its schema, you can rewrite that file with a new schema inside. But what if you have terabytes of avro/parquet files and you want to change their schema? Will you rewrite all of the data, every time the schema changes?

Schema evolution allows you to update the schema used to write new data, while maintaining backwards compatibility with the schema(s) of your old data. Then you can read it all together, as if all of the data has one schema. Of course there are precise rules governing the changes allowed, to maintain compatibility. Those rules are listed under Schema Resolution.