避免多个子查询

时间:2015-10-21 20:10:51

标签: sql postgresql

我正在操作一个长的,非常查询的表(> 5亿条目),因此避免大量查询非常重要。

目前我需要获取一些带有条件的值(稍后会更好地解释),然后检查这些值是否在另一组值中(所有这些都指向相同的字段)。我正在使用with创建表格视图。

所以,这是表格语法:( table employee)

+--------+-------------+-----------+--------+---------+-----------+
| period | employee_id | operation | sub_op | payment | work_zone |
+--------+-------------+-----------+--------+---------+-----------+

期间的格式为“YYMM”,一个期间指一个月。

当然,这个表比这个样本要长得多,但我只需要查询中的那些字段。简要说明我需要什么,然后是查询本身。

我需要获得当前employee_id中的所有period,其中payment(至少250美元)和特定operation(第一个I组operations 1}} sub_op值)。询问的operation值为97,在查询中,您将看到我如何对其进行分组。

现在,根据这些值,我将其按work_zone和分组operation值进行分组。现在子查询开始......我需要:

  1. 所有那些不是过去时期的价值。
  2. 所有那些不是过去36个时期(3年)的价值。
  3. 至少在过去36个时期中的一个中的所有值。
  4. 所有那些至少在过去36个时期中的一个中的值,但操作不同。
  5. 至少在过去36个期间中的一个,但付款低于250美元的所有价值。
  6. 所以,这是我到目前为止的查询。 (我用的是'1109'期间)

    CREATE OR REPLACE VIEW hired_fired AS
    WITH query_hired_fired AS ( 
        SELECT work_zone, operation, sub_op, employee_id,
            CASE 
                WHEN operation = 97 THEN 
                    CASE 
                        WHEN sub_op IN (1,3,5) THEN 'Cookers' 
                        WHEN sub_op IN (2,6) THEN 'Waitress' 
                        WHEN sub_op IN (4,7,8,9,10) THEN 'Cashier'
                        WHEN sub_op = 11 THEN 'Security' 
                        WHEN sub_op IN (12,13) THEN 'Cleaners' 
                        ELSE 'Others' 
                    END
            END AS opgroup
        FROM employee 
        WHERE period = 1109 AND payment >= 250 AND operation = 97
    )
    SELECT 201109 AS periodo, opgroup, work_zone
    (SELECT COUNT(DISTINCT employee_id) FROM query_hired_fired WHERE employee_id NOT IN (SELECT employee_id FROM employee WHERE period = 1108 AND payment >= 250 AND operation = 97)) AS total, 
    (SELECT COUNT(DISTINCT employee_id) FROM query_hired_fired WHERE employee_id NOT IN (SELECT employee_id FROM employee WHERE period BETWEEN 0808 AND 1108 AND payment >= 250 AND operation = 97)) AS absolut,
    (SELECT COUNT(DISTINCT employee_id) FROM query_hired_fired WHERE employee_id IN (SELECT employee_id FROM employee WHERE period BETWEEN 0808 AND 1108 AND payment >= 250 AND operation = 97)) AS reincorporated,
    (SELECT COUNT(DISTINCT employee_id) FROM query_hired_fired WHERE employee_id IN (SELECT employee_id FROM employee WHERE period BETWEEN 0808 AND 1108 AND payment >= 250 AND operation != 97)) AS operation_change,
    (SELECT COUNT(DISTINCT employee_id) FROM query_hired_fired WHERE employee_id IN (SELECT employee_id FROM employee WHERE period BETWEEN 0808 AND 1108 AND payment < 250 AND operation = 97)) AS raised,
    FROM query_hired_fired
    GROUP BY work_zone, opgroup
    

    所以,我的问题是......无论如何我可以在没有所有子查询的情况下执行此查询吗?我认为这需要几个小时的时间才能运行,而且这不符合使用此表的可能性。

    很抱歉,如果我对某些事情一直不清楚,我会尽快回答所有的问题和怀疑。感谢。

2 个答案:

答案 0 :(得分:1)

尝试此查询:

WITH query_hired_fired AS ( 
    SELECT work_zone, operation, sub_op, employee_id,
        CASE 
            WHEN operation = 97 THEN 
                CASE 
                    WHEN sub_op IN (1,3,5) THEN 'Cookers' 
                    WHEN sub_op IN (2,6) THEN 'Waitress' 
                    WHEN sub_op IN (4,7,8,9,10) THEN 'Cashier'
                    WHEN sub_op = 11 THEN 'Security' 
                    WHEN sub_op IN (12,13) THEN 'Cleaners' 
                    ELSE 'Others' 
                END
        END AS opgroup
    FROM employee 
)
SELECT opgroup, work_zone,
       SUM( x_period_1109 * x_total )            As total,
       SUM( x_period_1109 * x_absolut )          As absolut,
       SUM( x_period_1109 * x_reincorporated )   As reincorporated,
       SUM( x_period_1109 * x_operation_change ) As operation_change,
       SUM( x_period_1109 * x_raised )           As raised
FROM (
    SELECT opgroup, work_zone, employee_id,
           MAX( CASE WHEN period = 1108 AND payment >= 250 AND operation = 97 THEN 1 ELSE 0 END) as x_total,
           MAX( CASE WHEN period = 1108 AND payment >= 250 AND operation = 97 THEN 1 ELSE 0 END ) as x_absolut,
           MAX( CASE WHEN period BETWEEN 0808 AND 1108 AND payment >= 250 AND operation = 97 THEN 1 ELSE 0 END ) as x_reincorporated,
           MAX( CASE WHEN period BETWEEN 0808 AND 1108 AND payment >= 250 AND operation != 97 THEN 1 ELSE 0 END ) as x_operation_change,
           MAX( CASE WHEN period BETWEEN 0808 AND 1108 AND payment < 250 AND operation = 97 THEN 1 ELSE 0 END ) as x_raised,
           MAX( CASE WHEN period = '1109' AND payment >= 250 AND operation = 97 THEN 1 ELSE 0 END ) As x_period_1109
    FROM query_hired_fired
    WHERE period BETWEEN 0808 AND 1109
    GROUP BY opgroup, work_zone, employee_id
) x
GROUP BY work_zone, opgroup

您的查询中的这种情况:BETWEEN 1108 AND 0808始终评估为false,
我认为它应该是:BETWEEN 0808 AND 1108

答案 1 :(得分:1)

我和Kordirko有点相似,但却融为一体。内部“PreCalc”查询的前提是,如果满足条件,则每个员工计算一行,标志为1或0。由于您的所有条件都基于范围或者只是1108 OR(在0808和1108之间),因此该子查询只能获得0808和1108之间的所有记录,因此它将简化复杂情况/条件时的可读性。我应用它的唯一条件是你专门寻找确切前期的第一个条件。也就是说,其余的项目是付款金额的限定符,并且是(或不是)操作97.因此对于任何员工,标志将分别设置为1或0。

现在,它将应用于执行SUM / CASE的外部查询。考虑到你的“NOT IN”,我正在寻找给定的flag = 0(因此不符合基础数据)vs flag = 1它DID符合基础数据。

由于预查询也计算了“opgroup”,所以它完全包裹起来。

我会确保yourtable有一个索引 (期间,employee_id,work_zone)帮助优化。您可以进一步使用索引键使其成为覆盖索引,但请先了解它是如何工作的。

SELECT
      201109 AS periodo, 
      work_zone, 
      opgroup,
      SUM( case when PreCalc.LPOver250 == 0 end ) as EmpsNotInLastPeriodOver250,
      SUM( case when PreCalc.Over250Op97 == 0 end ) as EmpsNotInOver250Per97,
      SUM( case when PreCalc.Over250Op97 == 1 end ) as EmpsInOver250Per97,
      SUM( case when PreCalc.Over250NotOp97 == 1 end ) as EmpsOver250NotInOp97,
      SUM( case when PreCalc.Under250 == 1 end ) as EmpsUnder250
   from
      ( SELECT
              Employee_ID,
              work_zone, 
              CASE WHEN operation = 97 THEN 
                   CASE WHEN sub_op IN (1,3,5) THEN 'Cookers' 
                        WHEN sub_op IN (2,6) THEN 'Waitress' 
                        WHEN sub_op IN (4,7,8,9,10) THEN 'Cashier'
                        WHEN sub_op = 11 THEN 'Security' 
                        WHEN sub_op IN (12,13) THEN 'Cleaners' 
                        ELSE 'Others' 
                   END
              END AS opgroup, 
              MAX( case when period = 1108 
                        and payment >= 250 
                        and operation = 97 then 1 else 0 end ) as LPOver250,
              MAX( case when payment >= 250 
                        and operation = 97 then 1 else 0 end ) as Over250Op97,
              MAX( case when payment >= 250 
                        and operation != 97 then 1 else 0 end ) as Over250NotOp97,
              MAX( case when payment < 250 
                        and operation = 97 then 1 else 0 end ) as Under250
           from
              employee 
           where
              period between 0808 and 1108
           group by
              Employee_ID,
              work_zone, 
              opgroup ) PreCalc
   group by
      work_zone, 
      opgroup
相关问题