我有ORC的以下表格定义,我想复制到Parquet(我还没有显示更多字段):
CREATE EXTERNAL TABLE `test_a`(
`some_id` int,
`sha_sum` string,
`parent_sha_sum` string,
`md5_sum` string
)
PARTITIONED BY (
`server_date` date
)
CLUSTERED BY (
sha_sum
)
SORTED BY (
sha_sum, parent_sha_sum, md5_sum
)
INTO 256 BUCKETS
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'hdfs://cluster/user/myuser/test_a'
TBLPROPERTIES (
'orc.compress'='ZLIB',
'orc.create.index'='true',
'orc.stripe.size'='130023424',
'orc.row.index.stride'='64000',
'orc.create.index'='true';
我想知道如何将它复制到Parquet。我想使用ZLIB或类似的东西进行压缩,我想有索引并可能调整一些用于Parquet的TBLPROPERTIES。
CREATE EXTERNAL TABLE `test_b`(
`some_id` int,
`sha_sum` string,
`parent_sha_sum` string,
`md5_sum` string
)
PARTITIONED BY (
`server_date` date
)
CLUSTERED BY (
sha_sum
)
SORTED BY (
sha_sum, parent_sha_sum, md5_sum
)
INTO 256 BUCKETS
STORED AS PARQUET
LOCATION 'hdfs://cluster/user/myuser/test_b'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='true'
)
是否有通过TBLPROPERTIES可用于Parquet的所有选项的列表?