提高PostgreSQL查询性能,使得加入了1亿个数据

时间:2016-02-16 16:53:49

标签: java hibernate postgresql

我正在使用Postgresql-9.2 versionWindows 7 64 bitRAM 6GB。这是一个Java企业项目。

我必须在我的页面中显示订单相关信息。有三个表通过左连接汇集在一起​​。

表:

  1. TV_HD(389772行)
  2. TV_SNAPSHOT(1564756行)
  3. TD_MAKKA(419298行)
  4. 在离开加入3个表后,查询会给出487252。它也会日益增加。

    enter image description here

    表关系:

    1. TV_HD包含"一对多"与TV_SNAPSHOT的关系
    2. TV_HD包含"一对多"与TD_MAKKA的关系
    3. 为了更好地理解我现在用sql查询提供图片视图

      SELECT * FROM tv_hd其中urino = 1630799 enter image description here

      SELECT * FROM tv_snapshot其中urino = 1630799 enter image description here

      SELECT * FROM td_makka,其中urino = 1630799 enter image description here 此查询大约在90秒内运行。如何提高查询性能?

      我也考虑过索引。但据我所知,当我们想从表中获得2%-4%的数据时,实际使用了索引。但在我的情况下,我需要来自这3个表的所有数据。

      以下是查询:

      SELECT count(*)
      FROM (SELECT HD.URINO
            FROM
              TV_HD HD
              LEFT JOIN TV_SNAPSHOT T ON (HD.URINO = T.URINO AND HD.TCODE = T.TCODE AND T.DELFLG = 0 AND T.SYUBETSU = 1)
              LEFT JOIN TV_SNAPSHOT T_SQ
                ON (HD.URINO = T_SQ.URINO AND HD.SQCODE = T_SQ.TCODE AND T_SQ.DELFLG = 0 AND T_SQ.SYUBETSU = 3)
              LEFT JOIN (SELECT N.URINO
                         FROM
                           TD_MAKKA N
                         WHERE
                           N.UPDATETIME IN (
                             SELECT MIN(NMIN.UPDATETIME)
                             FROM
                               TD_MAKKA NMIN
                             WHERE
                               N.URINO = NMIN.URINO
                               AND
                               NMIN.TORIKESHIFLG <> -1
                           )
                        ) NYUMIN
                ON (HD.URINO = NYUMIN.URINO)
              LEFT JOIN
              (
                SELECT
                  NSUM.URINO,
                  SUM(COALESCE(NSUM.NYUKIN, 0))                                                             NYUKIN,
                  SUM(COALESCE(NSUM.NYUKIN, 0)) + SUM(COALESCE(NSUM.TESU, 0)) + SUM(COALESCE(NSUM.SOTA, 0)) SUMNYUKIN
                FROM
                  TD_MAKKA NSUM
                GROUP BY
                  URINO
              ) NYUSUM
                ON (HD.URINO = NYUSUM.URINO)
              LEFT JOIN
              (
                SELECT N.URINO
                FROM
                  TD_MAKKA N
                WHERE
                  UPDATETIME = (
                    SELECT MAX(UPDATETIME)
                    FROM
                      TD_MAKKA NMAX
                    WHERE
                      N.URINO = NMAX.URINO
                      AND
                      NMAX.TORIKESHIFLG <> -1
                  )
              ) NYUMAX
                ON (HD.URINO = NYUMAX.URINO)
            WHERE ((HD.URIBRUI <> '1') OR (HD.URIBRUI = '1' AND T_SQ.NYUKOBEFLG = '-1'))
            ORDER BY
              HD.URINO DESC
           ) COUNT_
      

      以下是EXPLAIN ANALYZE

      的结果
      Aggregate  (cost=7246861.21..7246861.22 rows=1 width=0) (actual time=69549.159..69549.159 rows=1 loops=1)
        ->  Merge Left Join  (cost=7240188.92..7242117.36 rows=379508 width=6) (actual time=68602.689..69510.563 rows=487252 loops=1)
              Merge Cond: (hd.urino = n.urino)
              ->  Sort  (cost=3727299.33..3728248.10 rows=379508 width=6) (actual time=62160.072..62557.132 rows=420036 loops=1)
                    Sort Key: hd.urino
                    Sort Method: external merge  Disk: 6984kB
                    ->  Hash Right Join  (cost=169264.26..3686940.26 rows=379508 width=6) (actual time=54796.930..60172.248 rows=420036 loops=1)
                          Hash Cond: (n.urino = hd.urino)
                          ->  Seq Scan on td_makka n  (cost=0.00..3511201.36 rows=209673 width=6) (actual time=24.326..4640.020 rows=419143 loops=1)
                                Filter: (SubPlan 1)
                                Rows Removed by Filter: 155
                                SubPlan 1
                                  ->  Aggregate  (cost=8.33..8.34 rows=1 width=23) (actual time=0.009..0.009 rows=1 loops=419298)
                                        ->  Index Scan using idx_td_makka on td_makka nmin  (cost=0.00..8.33 rows=1 width=23) (actual time=0.006..0.007 rows=1 loops=419298)
                                              Index Cond: (n.urino = urino)
                                              Filter: (torikeshiflg <> (-1)::numeric)
                                              Rows Removed by Filter: 0
                          ->  Hash  (cost=163037.41..163037.41 rows=379508 width=6) (actual time=54771.078..54771.078 rows=386428 loops=1)
                                Buckets: 4096  Batches: 16  Memory Usage: 737kB
                                ->  Hash Right Join  (cost=75799.55..163037.41 rows=379508 width=6) (actual time=51599.167..54605.901 rows=386428 loops=1)
                                      Hash Cond: ((t_sq.urino = hd.urino) AND (t_sq.tcode = hd.sqcode))
                                      Filter: ((hd.uribrui <> '1'::bpchar) OR ((hd.uribrui = '1'::bpchar) AND (t_sq.nyukobeflg = (-1)::numeric)))
                                      Rows Removed by Filter: 3344
                                      ->  Seq Scan on tv_snapshot t_sq  (cost=0.00..73705.42 rows=385577 width=15) (actual time=0.053..2002.953 rows=389983 loops=1)
                                            Filter: ((delflg = 0::numeric) AND (syubetsu = 3::numeric))
                                            Rows Removed by Filter: 1174773
                                      ->  Hash  (cost=68048.99..68048.99 rows=389771 width=14) (actual time=51596.055..51596.055 rows=389772 loops=1)
                                            Buckets: 4096  Batches: 16  Memory Usage: 960kB
                                            ->  Hash Right Join  (cost=21125.85..68048.99 rows=389771 width=14) (actual time=579.405..51348.270 rows=389772 loops=1)
                                                  Hash Cond: (nyusum.urino = hd.urino)
                                                  ->  Subquery Scan on nyusum  (cost=0.00..35839.52 rows=365638 width=6) (actual time=17.435..49996.674 rows=385537 loops=1)
                                                        ->  GroupAggregate  (cost=0.00..32183.14 rows=365638 width=34) (actual time=17.430..49871.702 rows=385537 loops=1)
                                                              ->  Index Scan using idx_td_makka on td_makka nsum  (cost=0.00..21456.76 rows=419345 width=34) (actual time=0.017..48357.702 rows=419298 loops=1)
                                                  ->  Hash  (cost=13969.71..13969.71 rows=389771 width=20) (actual time=491.549..491.549 rows=389772 loops=1)
                                                        Buckets: 4096  Batches: 32  Memory Usage: 567kB
                                                        ->  Seq Scan on tv_hd hd  (cost=0.00..13969.71 rows=389771 width=20) (actual time=0.052..242.415 rows=389772 loops=1)
              ->  Sort  (cost=3512889.60..3512894.84 rows=2097 width=6) (actual time=6442.600..6541.728 rows=486359 loops=1)
                    Sort Key: n.urino
                    Sort Method: external sort  Disk: 8600kB
                    ->  Seq Scan on td_makka n  (cost=0.00..3512773.90 rows=2097 width=6) (actual time=0.135..4053.116 rows=419143 loops=1)
                          Filter: ((updatetime)::text = (SubPlan 2))
                          Rows Removed by Filter: 155
                          SubPlan 2
                            ->  Aggregate  (cost=8.33..8.34 rows=1 width=23) (actual time=0.008..0.008 rows=1 loops=419298)
                                  ->  Index Scan using idx_td_makka on td_makka nmax  (cost=0.00..8.33 rows=1 width=23) (actual time=0.005..0.006 rows=1 loops=419298)
                                        Index Cond: (n.urino = urino)
                                        Filter: (torikeshiflg <> (-1)::numeric)
                                        Rows Removed by Filter: 0
      Total runtime: 69575.139 ms
      

      以下是解释分析结果的详细信息:

      http://explain.depesz.com/s/23Fg

2 个答案:

答案 0 :(得分:3)

第一步: 您可以删除选择查询中不需要的更多列,因为您只需计算总行数。例如:

select count(*) from ( SELECT
    HD.URINO
FROM
    TV_HD HD
    LEFT JOIN TV_SNAPSHOT T ON (HD.URINO = T.URINO AND HD.TCODE = T.TCODE AND T.DELFLG = 0 AND T.SYUBETSU = 1)
    LEFT JOIN TV_SNAPSHOT T_SQ ON (HD.URINO = T_SQ.URINO AND HD.SQCODE = T_SQ.TCODE AND T_SQ.DELFLG = 0 AND T_SQ.SYUBETSU = 3)
    LEFT JOIN (SELECT
                    N.URINO
            FROM
                TD_MAKKA N
            WHERE
                N.UPDATETIME IN (
                    SELECT
                        MIN (NMIN.UPDATETIME)
                    FROM
                        TD_MAKKA NMIN
                    WHERE
                        N.URINO = NMIN.URINO
                    AND
                        NMIN.TORIKESHIFLG <> -1 
                )
        ) NYUMIN
    ON  (HD.URINO = NYUMIN.URINO) 
            LEFT JOIN
                (
                    SELECT
                        NSUM.URINO
                        ,SUM (COALESCE(NSUM.NYUKIN ,0)) NYUKIN
                        ,SUM (COALESCE(NSUM.NYUKIN ,0)) + SUM (COALESCE(NSUM.TESU ,0)) + SUM (COALESCE(NSUM.SOTA ,0)) SUMNYUKIN
                    FROM
                        TD_MAKKA NSUM
                    GROUP BY
                        URINO
                ) NYUSUM
            ON  (HD.URINO = NYUSUM.URINO)
            LEFT JOIN
                (
                    SELECT
                         N.URINO
                    FROM
                        TD_MAKKA N
                    WHERE
                        UPDATETIME = (
                            SELECT
                                MAX (UPDATETIME)
                            FROM
                                TD_MAKKA NMAX
                            WHERE
                                N.URINO = NMAX.URINO
                            AND
                                NMAX.TORIKESHIFLG <> -1 
                        )
               ) NYUMAX
            ON  (HD.URINO = NYUMAX.URINO)
WHERE ( (HD.URIBRUI <> '1') OR ( HD.URIBRUI = '1' AND T_SQ.NYUKOBEFLG = '-1' ) )
 ORDER BY 
 HD.URINO DESC
  ) COUNT_

第二步: 您可以避免左连接,这对于获取行计数没有意义。 例如:

select count(*) from ( SELECT
    HD.URINO
FROM
    TV_HD HD
    LEFT JOIN TV_SNAPSHOT T ON (HD.URINO = T.URINO AND HD.TCODE = T.TCODE AND T.DELFLG = 0 AND T.SYUBETSU = 1)
    LEFT JOIN TV_SNAPSHOT T_SQ ON (HD.URINO = T_SQ.URINO AND HD.SQCODE = T_SQ.TCODE AND T_SQ.DELFLG = 0 AND T_SQ.SYUBETSU = 3)
    LEFT JOIN (SELECT
                    N.URINO
            FROM
                TD_MAKKA N
            WHERE
                N.UPDATETIME IN (
                    SELECT
                        MIN (NMIN.UPDATETIME)
                    FROM
                        TD_MAKKA NMIN
                    WHERE
                        N.URINO = NMIN.URINO
                    AND
                        NMIN.TORIKESHIFLG <> -1 
                )
        ) NYUMIN
    ON  (HD.URINO = NYUMIN.URINO) 
            LEFT JOIN
                (
                    SELECT
                         N.URINO
                    FROM
                        TD_MAKKA N
                    WHERE
                        UPDATETIME = (
                            SELECT
                                MAX (UPDATETIME)
                            FROM
                                TD_MAKKA NMAX
                            WHERE
                                N.URINO = NMAX.URINO
                            AND
                                NMAX.TORIKESHIFLG <> -1 
                        )
               ) NYUMAX
            ON  (HD.URINO = NYUMAX.URINO)
WHERE ( (HD.URIBRUI <> '1') OR ( HD.URIBRUI = '1' AND T_SQ.NYUKOBEFLG = '-1' ) )

  ) COUNT_

第三步:您可以使用 PgAdmin图形解释计划来分析查询并避免其他不必要的执行开销。

答案 1 :(得分:1)

根据查询:

此处的实际要求是 count 从内部sql找到的所有记录。

统计所有记录的优化理论:

  1. 删除SELECT查询中不必要的字段
  2. 删除ORDER BY ASC / DES部分(节省7% - 10%)
  3. 删除聚合函数(平均值,总和,计数等)
  4. 使用标准VACCUUM回收死元组占用的存储空间。
  5. http://explain.depesz.com/
  6. 研究“ EXPLAIN ANALYZE [your_query_here] ”结果

    解释1:删除SELECT查询中不必要的字段

    select count(*) from ( SELECT
        HD.URINO
        /*HD.URIBRUI,
        HD.TCODE,
        HD.SQCODE*/
    FROM
        TV_HD HD)
    

    解释2:删除ORDER BY ASC / DES部分(节省7% - 10%)

    select count(*) from ( SELECT
        HD.URINO
    FROM
        TV_HD HD
        /*ORDER BY HD.URINO DESC*/)
    

    解释3:删除聚合函数(平均值,总和,计数等)

    select count(*) from ( SELECT
        name
        /*MAX(salary),
        AVG(salary)*/
    FROM Emp)
    

    解释4:使用标准VACCUUM回收死元组占用的存储空间。

    VACUUM (VERBOSE, ANALYZE) your_table;
    

    在正常的PostgreSQL操作中,更新删除或废弃的元组不会从其表中物理删除;它们一直存在,直到VACUUM完成。因此,有必要在经常更新的表格上执行VACUUM periodicallyespecially

    VACUUM有两种变体:standard VACUUMVACUUM FULL

    VACUUM FULL可以回收更多的磁盘空间,但运行速度要慢得多。此外,VACUUM的标准形式可以与生产数据库操作并行运行。 (SELECT,INSERT,UPDATE和DELETE等命令将继续正常运行,但在使用ALTER TABLE等命令时,您将无法修改表的定义。)VACUUM FULL需要独占锁定它正在处理的表,因此不能与表的其他使用并行完成。

    因此,一般情况下,管理员应努力使用standard VACUUMavoid VACUUM FULL

    详情:

    1. http://www.postgresql.org/docs/9.1/static/sql-vacuum.html
    2. http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html
    3. 感谢您的时间。