使用COUNT在基于SET的查询上性能低下

时间:2016-08-01 14:04:46

标签: sql postgresql

我有以下存储过程,它评估临时表中的所有帐户并确定它们是否适合导入。如果是,它会将它们标记为suitableToImport = TRUE。如果没有,将给出一个理由。

虽然它的SET基础仍然很慢。我尝试过移动到EXIST而不是COUNT,但是测试似乎并没有表明它会产生很大的影响。

关于可以做什么的任何建议?

CREATE OR REPLACE FUNCTION assessInclusionOfAccountsFromStaging () RETURNS BOOLEAN AS $$ /*Only new accounts are valid, detailed issues checking and budge checking duplicates*/

DECLARE
    countOfAccountsInStaging INTEGER;
BEGIN

    /*Check that we have data to process*/
    countOfAccountsInStaging = COUNT(*) FROM importAccountsStaging;
    IF(countOfAccountsInStaging) = 0 THEN
        RAISE EXCEPTION 'No accounts available';
    END IF; 

    /*SET SuitableToImport*/
    RAISE NOTICE 'Processing accounts...';
    UPDATE importAccountsStaging SET suitableToImport = TRUE 
    WHERE 
        AND ((SELECT COUNT (*) FROM importAccountsStaging as accountsIterated /*Check for duplicates against staging enviroment at org level*/
            WHERE 
                (accountsIterated.code1 = importAccountsStaging.code1)
                OR (accountsIterated.code2 = importAccountsStaging.code2)
            )=1)

        /*Check for duplicate in masterdb*/
        AND ((SELECT COUNT (*) FROM masterAccounts /*Check for any potential duplicate at org level*/
            WHERE 
                (importAccountsStaging.code1 = masterAccounts.code1 )
                OR (importAccountsStaging.code2 = masterAccounts.code2 )
            )=0)
        ;

        /*SET COMMENT on why it's not suitable to import*/
        UPDATE importAccountsStaging SET reason = CONCAT(reason , 'existing account in staging|')
        WHERE
            NOT ((SELECT COUNT (*) FROM importAccountsStaging as tempAccounts 
            WHERE 
                tempAccounts.code1 = importAccountsStaging.code1
                OR tempAccounts.code2 = importAccountsStaging.code2 
            )=1);


            /*SET COMMENT on why it's not suitable to import*/
        UPDATE importAccountsStaging SET reason = CONCAT(reason , 'existing account in main|')
        WHERE
        NOT ((SELECT COUNT (*) FROM masterAccounts
        WHERE 
            importAccountsStaging.code1 = masterAccounts.code1
            OR importAccountsStaging.code2 = masterAccounts.code2
        )=0)
        ;

    /*Return values*/
RAISE NOTICE 'Assessment completed human! ';
RETURN  TRUE;
END; $$ 
LANGUAGE plpgsql;

非常感谢!

4 个答案:

答案 0 :(得分:1)

它是已知的反模式 - 通常COUNT(*)可能是非常慢的操作,因为它必须扫描所有可能的行。基于EXISTS的测试应该非常快,因为执行在第一行停止。所以较新者使用COUNT进行测试(如果存在或不存在)!始终使用EXISTS

答案 1 :(得分:0)

关联子句中的OR是一个杀手 - 它可能会导致对正在更新的表中的每条记录进行全表扫描。

假设您只是在寻找存在而不是实际数量,我建议:

WHERE (EXISTS (SELECT 1
               FROM importAccountsStaging ias
               WHERE ias.code1 = importAccountsStaging.code1
              ) OR
       EXISTS (SELECT 1
               FROM importAccountsStaging ias
               WHERE ias.code2 = importAccountsStaging.code2
              )
      ) AND
      (NOT EXISTS (SELECT 1
                   FROM masterAccounts ma
                   WHERE importAccountsStaging.code1 = ma.code1 
                  ) AND
       NOT EXISTS (SELECT 1
                   FROM masterAccounts ma
                   WHERE importAccountsStaging.code2 = ma.code2
                  ) 
      )

然后,您需要importAccountsStaging(code1)importAccountsStaging(code2)masterAccounts(code1)masterAccounts(code2)上的索引。

如果您正在寻找特定的计数,您也可以修改它的逻辑(它应该几乎同样快)。

答案 2 :(得分:0)

您可能需要考虑完全重新设计以完全摆脱OR。我强烈怀疑如果你将操作分解成更小的块,它将运行得更快。例如,而不是masterAccounts上的SELECT COUNT(*),为什么不这样做:

更新importAccountsStaging SET reason = CONCAT(原因,'main |'中的现有帐户) 来自masterAccounts WHERE importAccountsStaging.code1 = masterAccounts.code1;

更新importAccountsStaging SET reason = CONCAT(原因,'main |'中的现有帐户) 来自masterAccounts WHERE importAccountsStaging.code2 = masterAccounts.code2;

类似于你的其他检查......然后只是结束

UPDATE importAccountsStaging SET properToImport = TRUE 原因是空的

答案 3 :(得分:0)

这是优化的查询,我从所有答案中获取了输入,所以非常感谢您的帮助。这些更改使查询在大约7分钟内从超过10小时开始执行。

CREATE OR REPLACE FUNCTION assessInclusionOfAccountsFromStaging () RETURNS BOOLEAN AS $$ /*Only new accounts are valid, detailed issues checking and budge checking duplicates*/

DECLARE
    countOfAccountsInStaging INTEGER;
BEGIN

    /*Check that we have data to process*/
    countOfAccountsInStaging = COUNT(*) FROM importAccountsStaging;
    IF(countOfAccountsInStaging) = 0 THEN
        RAISE EXCEPTION 'No accounts available';
    END IF; 

    RAISE NOTICE 'Processing accounts...';
    /*Checking value of row against the table the row belongs to for potential duplicates*/
    UPDATE importAccountsStaging SET reason = CONCAT(importAccountsStaging.reason , 'existing code1 in staging|')
    WHERE ((SELECT COUNT (*) FROM importAccountsStaging as tempAccounts 
        WHERE 
            tempAccounts.code1 = importAccountsStaging.code1
        )>1);
    UPDATE importAccountsStaging SET reason = CONCAT(importAccountsStaging.reason , 'existing code2 in staging|')
    WHERE ((SELECT COUNT (*) FROM importAccountsStaging as tempAccounts 
        WHERE 
            tempAccounts.code2 = importAccountsStaging.code2
        )>1);

    /*Checking value of row against another table*/
    UPDATE importAccountsStaging SET reason = CONCAT(importAccountsStaging.reason , 'existing code1 in masterDB|')
    FROM masterAccounts  WHERE importAccountsStaging.code1 = masterAccounts.code1;
    UPDATE importAccountsStaging SET reason = CONCAT(importAccountsStaging.reason , 'existing code2 in masterDB|')
    FROM masterAccounts  WHERE importAccountsStaging.code2 = masterAccounts.code2;

    /*Final flag where no issues were found*/
    UPDATE importAccountsStaging SET suitableToImport = TRUE 
    WHERE reason IS NULL;

    /*Return values*/
RAISE NOTICE 'Assessment complete, all done! ';
RETURN  TRUE;
END; $$ 
LANGUAGE plpgsql;