当表svv_table_info中的stats_off列的值为99%时,这意味着什么?

时间:2017-02-22 04:45:36

标签: amazon-redshift

对于表stats_off中的一个表svv_table_info列,其值为99%。这是什么意思?以及如何解决它?

我尝试了这张桌子的anaylse和真空历史。 Analyze和Vacuum是否对此列值起任何作用?

1 个答案:

答案 0 :(得分:6)

VACUUM命令将审核该表并根据需要重新排列磁盘上的数据,这将影响unsortedempty列。接近0越好。

ANALYZE命令将审核该表并重新计算相应的统计信息,这将影响stats_off列。接近0越好。

即使在运行ANALYZE命令后,它也可能没有太大变化。要最大化可能的最低值,应首先运行VACUUM命令。表的统计信息包括已删除的旧记录 - 在Redshift中,它们只是被跳过,但它们仍会对整体查询性能产生影响。因此,首先在表上运行VACUUM,您将为ANALYZE命令提供可用数据的最佳视图。

仅仅因为表格的统计数据陈旧并不意味着它必然会导致问题。您需要查找的是来自查询计划生成器的警报,以查看它是否在抱怨桌面上的统计信息。您通常会在执行表连接时看到这些投诉。此查询将查看是否在最后一天注册了任何这些投诉,并提供了在需要时运行的命令列表...

SELECT DISTINCT 'ANALYZE ' + feedback_tbl.schema_name + '.' + feedback_tbl.table_name + ';' AS command
FROM ((SELECT
         TRIM(n.nspname) schema_name,
         c.relname       table_name
       FROM (SELECT
               TRIM(SPLIT_PART(SPLIT_PART(a.plannode, ':', 2), ' ', 2)) AS Table_Name,
               COUNT(a.query),
               DENSE_RANK()
               OVER (
                 ORDER BY COUNT(a.query) DESC)                          AS qry_rnk
             FROM stl_explain a,
               stl_query b
             WHERE a.query = b.query
                   AND CAST(b.starttime AS DATE) >= dateadd(DAY, -1, CURRENT_DATE)
                   AND a.userid > 1
                   AND a.plannode LIKE '%%missing statistics%%'
                   AND a.plannode NOT LIKE '%%_bkp_%%'
             GROUP BY Table_Name) miss_tbl
         LEFT JOIN pg_class c ON c.relname = TRIM(miss_tbl.table_name)
         LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
       WHERE miss_tbl.qry_rnk <= 25)
      -- Get the top N rank tables based on the stl_alert_event_log alerts
      UNION
      SELECT
        schema_name,
        table_name
      FROM (SELECT
              TRIM(n.nspname)              schema_name,
              c.relname                    table_name,
              DENSE_RANK()
              OVER (
                ORDER BY COUNT(*) DESC) AS qry_rnk,
              COUNT(*)
            FROM stl_alert_event_log AS l
              JOIN (SELECT
                      query,
                      tbl,
                      perm_table_name
                    FROM stl_scan
                    WHERE perm_table_name <> 'Internal Worktable'
                    GROUP BY query,
                      tbl,
                      perm_table_name) AS s ON s.query = l.query
              JOIN pg_class c ON c.oid = s.tbl
              JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
            WHERE l.userid > 1
                  AND l.event_time >= dateadd(DAY, -1, CURRENT_DATE)
                  AND l.Solution LIKE '%%ANALYZE command%%'
            GROUP BY TRIM(n.nspname),
              c.relname) anlyz_tbl
      WHERE anlyz_tbl.qry_rnk < 25) feedback_tbl
  JOIN svv_table_info info_tbl
    ON info_tbl.schema = feedback_tbl.schema_name
       AND info_tbl.table = feedback_tbl.table_name
WHERE info_tbl.stats_off :: DECIMAL(32, 4) > 10 :: DECIMAL(32, 4)
      AND TRIM(info_tbl.schema) = 'public'
ORDER BY info_tbl.size ASC;

当我们查看它时,此查询将查看VACUUM命令的表...

SELECT 'VACUUM FULL ' + "schema" + '.' + "table" + ';' AS command
FROM svv_table_info
WHERE (unsorted > 5 OR empty > 5)
      AND size < 716800;

这些查询包含亚马逊定义的建议阈值,并在其公共Python脚本中提供,用于管理Redshift集群located here