Question

我有一个S3 bucket: s3://bucket-name/year=2018/month=xx/day=xx/hour=xx/minute=xx.

相应地，我的AWS Athena表上有5个分区列（年，月，日，小时，分钟）。

我想在10月加载所有数据。

ALTER TABLE table_name add partition (all 5 partitions)
location "s3://data/year=xx/month=xx/.......";

但是，我必须为所有分钟级别的分区编写此“ alter table”命令，这是不可能的。

是否可以在AWS Athena中编写脚本来加载所有分区？

Answer 1

可以使用MSCK REPAIR TABLE命令。

https://docs.aws.amazon.com/athena/latest/ug/partitions.html

首先，您需要创建一个与以下表格相似的表格。

CREATE EXTERNAL TABLE `example`(
col1 string,
col2 string)
PARTITIONED BY ( 
`year` int, `year` `month`,`day`,`hour`,`minute`)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' 
STORED AS INPUTFORMAT 
'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://data'

之后，您应该可以运行“ MSCK REPAIR TABLE”命令。

如果要继续使用“ alter table add parition”命令，则需要创建自定义脚本。在这种情况下，它将是有用的boto3或jdbc athena驱动程序：

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/athena.html#Athena.Client.start_query_execution

https://docs.aws.amazon.com/athena/latest/ug/connect-with-jdbc.html

如何通过AWS Athena中的脚本加载所有分区？

1 个答案: