查询超时,数百万行排序时只需数万个匹配

时间:2017-07-29 19:49:26

标签: db2 sql-tuning

我正在尝试制定一个查询,该查询将查找在指定时间段内没有交易的所有用户。我的问题是我的查询就像在嵌套循环中被捕获一样。我试图找出我的逻辑存在缺陷的地方。

我不能给出实际的查询,因为它是为了工作,但这是一个使用类似结构的示例。 (是的,余额/交易数据分布在两对表中......这是我必须使用的)

鉴于架构:

Users          Balances_A       Transactions_A
user_id        account_id <-\    transaction_id
ssn <------+-- ssn           \-- account_id
occupation |   balance           amount
name       |   type              trdate
address    |   department                  
           |                     
           |
           |   Balances_B       Transactions_B
           |   account_id <-\    transaction_id
           +-- ssn           \-- account_id
               balance           amount
               code (*)          trdate
               department        


* same as type, just different field name.
Note: each "<---" indicates a 1 to many relationship

任务:找到所有拥有type ='A',department ='1'帐户的用户,其当前余额为0.00,并且在去年内有一个类型'A'的交易。还需要了解所有 其他 类型的交易的余额,但不包括类型“X”和“Y”。

参数:department ='1',type ='A',交易日期&lt;一年前,余额= 0.00

这是我试过的:

SELECT 
    u.user_id, u.name, u.address, u.ssn, 
    account_balances_a.other_balance, 
    account_balances_b.other_balance, 
    last_transaction_a.last_transaction_date,
    last_transaction_b.last_transaction_date

FROM users AS u

-- attach other balance total from A
LEFT JOIN ( SELECT SUM(balance) as other_balance
            FROM balances_a as bal_a
            WHERE bal_a.type NOT IN ('A','X','Y') AND bal_a.department='1'
            GROUP BY bal_a.ssn
          ) AS account_balances_a
            ON u.ssn = account_balances_a.ssn

-- attach other balance total from B
LEFT JOIN ( SELECT SUM(balance) as other_balance
            FROM balances_b as bal_b
            WHERE bal_b.code NOT IN ('A','X','Y') AND bal_b.department='1'
            GROUP BY bal_b.ssn
          ) AS account_balances_b
            ON u.ssn = account_balances_b.ssn


-- regular join balance A table

, balances_a AS ba

-- attach last transaction date ( transactions A )
LEFT JOIN ( SELECT MAX(temp1.trdate) as last_transaction_date
            FROM transactions_a as temp1
            GROUP BY temp1.account_id
          ) AS last_transaction_a
            ON temp1.account_id = ba.account_id

-- regular join balance B table

, balances_b AS bb

-- attach last transaction date ( transactions B )
LEFT JOIN ( SELECT MAX(temp2.trdate) as last_transaction_date
            FROM transactions_b as temp2
            GROUP BY temp2.account_id
          ) AS last_transaction_b
            ON temp2.account_id = bb.account_id

WHERE

    u.occupation='ditch digger'

    -- user has an account type 'A' with department '1' in the specified time frame:
    AND (
            -- either in Balance A table, 
            ( u.ssn=ba.ssn AND ba.balance=0.00 AND ba.type='A' AND ba.department='1' and last_transaction_a.last_transaction_date>'$one_year-ago' )
            OR
            -- or in Balance B table
            ( u.ssn=bb.ssn AND bb.balance=0.00 AND bb.code='A' AND bb.department='1' and last_transaction_b.last_transaction_date>'$one_year-ago' )
        )

ORDER BY last_transaction_a.last_transaction_date

问题似乎出现在WHERE子句中;如果我注释掉“...在平衡表A中”或“在平衡B表中”,则查询有效。但两者兼而有之,它正试图订购数百万条记录。

把它拿出来之后,我想我明白为什么会失败;但是如果你花时间和我一起思考并且可以很好地解释它失败的原因,我将不胜感激。

2 个答案:

答案 0 :(得分:1)

因为在联合用户之前必须首先加入balancea和transactiona。否则你做2交叉连接(真的是低性能,因为扫描所有表格乘以所有行=&gt;你在你的where子句中使用OR)

尝试像这样修改您的查询

SELECT 
    u.user_id, u.name, u.address, u.ssn, 
    account_balances_a.other_balance, 
    account_balances_b.other_balance, 
    last_transaction_a.last_transaction_date,
    last_transaction_b.last_transaction_date

FROM users AS u

LEFT OUTER JOIN LATERAL
 ( 
  SELECT SUM(bal_a.balance) as other_balance FROM balances_a as bal_a
  WHERE bal_a.department='1' and u.ssn = bal_a_a.ssn and bal_a.type NOT IN ('A','X','Y')
 ) account_balances_a on 1=1

LEFT OUTER JOIN LATERAL
 (
  SELECT SUM(bal_b.balance) as other_balance FROM balances_b as bal_b
  WHERE bal_b.department='1' and u.ssn = bal_b.ssn and bal_b.type NOT IN ('A','X','Y')
 ) account_balances_b on 1=1

LEFT OUTER JOIN LATERAL
 (
  SELECT MAX(temp1.trdate) as last_transaction_date
  FROM transactions_a as temp1 inner join balances_a ba on temp1.account_id = ba.account_id
  WHERE u.ssn = ba.ssn and ba.type='A' and ba.balance=0.00 and ba.department='1'  
 ) last_transaction_a on last_transaction_date>current date - 1 year

LEFT OUTER JOIN LATERAL
 ( 
  SELECT MAX(temp2.trdate) as last_transaction_date
  FROM transactions_b as temp2 inner join balances_b bb on temp2.account_id = bb.account_id
  where u.ssn=bb.ssn AND bb.code='A' AND bb.balance=0.00 AND bb.department='1' 
 ) last_transaction_b on last_transaction_date>current date - 1 year

WHERE u.occupation='ditch digger' 
AND (last_transaction_a.last_transaction_date is not null or last_transaction_b.last_transaction_date is not null)

ORDER BY last_transaction_a.last_transaction_date

答案 1 :(得分:0)

感谢Esperento57的回答,我重新研究了“交叉连接”。我忘记了多个(逗号分隔)表的查询开始交叉连接;我基本上在from子句中交叉连接了3个表。 (至少,这是我的意图。)所以由where子句正确加入它们。

......很明显它没有做到。

在我看来,所有表都与users.ssn绑在一起。因此,它会遍历用户(在balances a&amp; b中将各种过滤器与其绑定),一切都应该正常。

[尤里卡时刻]

...然后它遍历balances_a,一切都非常糟糕。 where子句甚至没有像我想象的那样接近表格。 OR导致balances_a和用户之间的交叉连接。

如果这还不够糟糕,那么它会用balances_b重新开始整个事情。

这引导我找到我想要的故障排除概念。无论这是数据库实际的工作方式,您似乎可以将每个以逗号分隔的表视为迭代其所有行。 (即交叉连接) where子句必须适用于每个以逗号分隔的表的迭代。

由于这个查询是一个非常糟糕的失败,我重新开始并发现在(过滤的)余额上进行联合然后离开加入用户以及总和(余额)和最大值(日期)要好得多。