我有一个带有基于日期的分区的雅典娜表,像这样:
20190218
我要删除去年创建的所有分区。
我尝试了以下查询,但没有成功。
ALTER TABLE tblname DROP PARTITION (partition1 < '20181231');
ALTER TABLE tblname DROP PARTITION (partition1 > '20181010'), Partition (partition1 < '20181231');
答案 0 :(得分:1)
尽管Athena SQL目前可能不支持它,但Glue API调用GetPartitions
(Athena在后台使用该查询)支持复杂的过滤器表达式,类似于您可以在SQL {{1 }}表达式。
您可以使用Glue API进行GetPartitions
,然后进行BatchDeletePartition
,而不是通过Athena删除分区。
答案 1 :(得分:0)
根据https://docs.aws.amazon.com/athena/latest/ug/alter-table-drop-partition.html, var[0]=("AAA
aaa")
var[1]=("BBB
bbb")
var[2]=("CCC
ccc")
采用分区规范,因此不允许使用范围。
在Presto中,您可以进行ALTER TABLE tblname DROP PARTITION
,但雅典娜也不支持DELETE FROM tblname WHERE ...
。
由于这些原因,您需要利用一些外部解决方案。
例如:
答案 2 :(得分:0)
这是执行 Theo 推荐的脚本。
import json
import logging
import awswrangler as wr
import boto3
from botocore.exceptions import ClientError
logging.basicConfig(level=logging.INFO, format=logging.BASIC_FORMAT)
logger = logging.getLogger()
def delete_partitions(database_name: str, table_name: str):
client = boto3.client('glue')
paginator = client.get_paginator('get_partitions')
page_count = 0
partition_count = 0
for page in paginator.paginate(DatabaseName=database_name, TableName=table_name, MaxResults=20):
page_count = page_count + 1
partitions = page['Partitions']
partitions_to_delete = []
for partition in partitions:
partition_count = partition_count + 1
partitions_to_delete.append({'Values': partition['Values']})
logger.info(f"Found partition {partition['Values']}")
if partitions_to_delete:
response = client.batch_delete_partition(DatabaseName=database_name, TableName=table_name,
PartitionsToDelete=partitions_to_delete)
logger.info(f'Deleted partitions with response: {response}')
else:
logger.info('Done with all partitions')
def repair_table(database_name: str, table_name: str):
client = boto3.client('athena')
try:
response = client.start_query_execution(QueryString='MSCK REPAIR TABLE ' + table_name + ';',
QueryExecutionContext={'Database': database_name}, )
except ClientError as err:
logger.info(err.response['Error']['Message'])
else:
res = wr.athena.wait_query(query_execution_id=response['QueryExecutionId'])
logger.info(f"Query succeeded: {json.dumps(res, indent=2)}")
if __name__ == '__main__':
table = 'table_name'
database = 'database_name'
delete_partitions(database_name=database, table_name=table)
repair_table(database_name=database, table_name=table)