如何在RedShift / ParAccel中测量磁盘上的表空间

时间:2013-10-22 05:53:41

标签: amazon-web-services amazon-redshift paraccel

我在RedShift中有一张表。 如何查看它使用了多少磁盘空间?

4 个答案:

答案 0 :(得分:44)

使用此演示文稿中的查询:http://www.slideshare.net/AmazonWebServices/amazon-redshift-best-practices

分析群集的磁盘空间使用情况:

select
    trim(pgdb.datname) as Database,
    trim(pgn.nspname) as Schema,
    trim(a.name) as Table,
    b.mbytes,
    a.rows
from (
    select db_id, id, name, sum(rows) as rows
    from stv_tbl_perm a
    group by db_id, id, name
) as a
join pg_class as pgc on pgc.oid = a.id
join pg_namespace as pgn on pgn.oid = pgc.relnamespace
join pg_database as pgdb on pgdb.oid = a.db_id
join (
    select tbl, count(*) as mbytes
    from stv_blocklist
    group by tbl
) b on a.id = b.tbl
order by mbytes desc, a.db_id, a.name; 

分析节点之间的表分布:

select slice, col, num_values, minvalue, maxvalue
from svv_diskusage
where name = '__INSERT__TABLE__NAME__HERE__' and col = 0
order by slice, col;

答案 1 :(得分:9)

我知道这个问题已经过时并且已经接受了答案,但我必须指出答案是错误的。 查询输出的是" mb"实际上是"块数"。只有当块大小为1MB(这是默认值)时,答案才是正确的。

如果块大小不同(在我的情况下例如是256K),则必须将块数乘以其大小(以字节为单位)。我建议您对查询进行以下更改,其中我将块数乘以块大小(以字节为单位)(262144字节),然后除以(1024 * 1024)以输出以兆字节为单位的总数:

select
    trim(pgdb.datname) as Database,
    trim(pgn.nspname) as Schema,
    trim(a.name) as Table,
    b.mbytes as previous_wrong_value,
    (b.mbytes * 262144)::bigint/(1024*1024) as "Total MBytes", 
    a.rows
from (
    select db_id, id, name, sum(rows) as rows
    from stv_tbl_perm a
    group by db_id, id, name
) as a
join pg_class as pgc on pgc.oid = a.id
join pg_namespace as pgn on pgn.oid = pgc.relnamespace
join pg_database as pgdb on pgdb.oid = a.db_id
join (
    select tbl, count(blocknum) as mbytes
    from stv_blocklist
    group by tbl
) b on a.id = b.tbl
order by mbytes desc, a.db_id, a.name; 

答案 2 :(得分:1)

为上述查询添加所有者和架构过滤器:

select
 cast(use.usename as varchar(50)) as owner, 
 trim(pgdb.datname) as Database,
 trim(pgn.nspname) as Schema,
 trim(a.name) as Table,
 b.mbytes,
 a.rows
from 
 (select 
   db_id,
   id, 
   name,
   sum(rows) as rows
  from stv_tbl_perm a
  group by db_id, id, name
 ) as a
 join pg_class as pgc on pgc.oid = a.id
 left join pg_user use on (pgc.relowner = use.usesysid)
 join pg_namespace as pgn on pgn.oid = pgc.relnamespace 
   -- leave out system schemas
   and pgn.nspowner > 1
 join pg_database as pgdb on pgdb.oid = a.db_id
 join 
  (select 
    tbl,
    count as mbytes
   from stv_blocklist
   group by tbl
 ) b on a.id = b.tbl
order by mbytes desc, a.db_id, a.name;

答案 3 :(得分:0)

我以为我会面对一个分布不均的问题,我会扩展这个。我添加了一些链接和字段,以便按节点和切片分析空间。还添加了第0列的最大/最小值和每个切片的值数。

select
 cast(use.usename as varchar(50)) as owner, 
 trim(pgdb.datname) as Database,
 trim(pgn.nspname) as Schema,
 trim(a.name) as Table,
 a.node,
 a.slice,
 b.mbytes,
 a.rows,
 a.num_values,
 a.minvalue,
 a.maxvalue
from 
 (select 
   a.db_id,
   a.id, 
   s.node,
   s.slice,
   a.name,
   d.num_values,
   d.minvalue,
   d.maxvalue,
   sum(rows) as rows
  from stv_tbl_perm a
  inner join stv_slices s on a.slice = s.slice
  inner join (
    select tbl, slice, sum(num_values) as num_values, min(minvalue) as minvalue, max(maxvalue) as maxvalue
    from svv_diskusage
    where col = 0
    group by 1, 2) d on a.id = d.tbl and a.slice = d.slice
  group by 1, 2, 3, 4, 5, 6, 7, 8
 ) as a
 join pg_class as pgc on pgc.oid = a.id
 left join pg_user use on (pgc.relowner = use.usesysid)
 join pg_namespace as pgn on pgn.oid = pgc.relnamespace 
   -- leave out system schemas
   and pgn.nspowner > 1
 join pg_database as pgdb on pgdb.oid = a.db_id
 join 
  (select 
    tbl,
    slice,
    count(*) as mbytes
   from stv_blocklist
   group by tbl, slice
 ) b on a.id = b.tbl
    and a.slice = b.slice
order by mbytes desc, a.db_id, a.name, a.node;