我正在尝试深复制一张表,该表占磁盘总空间的12.5%(磁盘上实际数据的25%左右)。插入新表时(建议压缩),查询失败,磁盘使用率已满错误。
如何在不引起任何问题的情况下对如此巨大的表执行深度复制?
答案 0 :(得分:0)
我们的舞台集群也遇到了同样的问题。当大型RedShift表严重碎片化时,对该表进行深度复制的尝试需要将整个表暂时保留在内存和磁盘中,以删除该表上的所有编码,然后在对该表进行重新编码和排序之前,将其放入新的目标深度复制表中;这需要大量的磁盘空间。
我们按照以下步骤为1 TB的表克服了80%的碎片这一问题:
作为最佳实践,我们应始终监视磁盘碎片小于20%的情况。 下面的查询可能对于设置为警报以监视群集健康状况以查看表碎片和统计信息可能有用:
SELECT feedback_tbl.schema_name,
feedback_tbl.table_name,
cast(info_tbl."size"/1024 AS varchar(10)) + ' GB' table_size,
coalesce(unsorted :: varchar(10),'null') + '%' fragment_pct, stats_off :: varchar(10) stale_stat_pct
FROM (SELECT schema_name, table_name
FROM (SELECT TRIM(n.nspname) schema_name,
TRIM(c.relname) table_name,
DENSE_RANK() OVER(ORDER BY COUNT(*) DESC) AS qry_rnk,
COUNT(*)
FROM stl_alert_event_log AS l
JOIN (SELECT query, tbl, perm_table_name
FROM stl_scan
WHERE perm_table_name <> 'Internal Worktable'
GROUP BY query, tbl, perm_table_name) AS s
ON s.query = l.query
JOIN pg_class c
ON c.oid = s.tbl
JOIN pg_catalog.pg_namespace n
ON n.oid = c.relnamespace
WHERE l.userid > 1
AND l.event_time >= dateadd(DAY, -7, CURRENT_DATE)
AND regexp_instr(solution, '.*VACUUM.*reclaim deleted.') > 0
and TRIM(n.nspname)='ivh'
GROUP BY TRIM(n.nspname), TRIM(c.relname)
) anlyz_tbl
WHERE anlyz_tbl.qry_rnk < 25) feedback_tbl
JOIN svv_table_info info_tbl
ON info_tbl.schema = feedback_tbl.schema_name
AND info_tbl.table = feedback_tbl.table_name
WHERE (info_tbl.unsorted > 5 OR info_tbl.stats_off > 10)
SQL选自AWS Redshift分析真空实用程序。