Oracle - 查询大型数据集需要很长时间 - 有没有办法对其进行优化?

时间:2014-02-26 14:43:44

标签: sql database oracle

假设我的Oracle数据库中有一个非常大的表,其中包含数千个项目的数据。这些数据会在一天中非常频繁地更新,每次更新都会得到一个时间戳。

因此,例如,该表如下所示(我知道列名称不好,这只是插图):

TBLDaily:

Date:         ItemNo:     CharA:  ....  CharN:    Time_Stamp:
2014/02/15    123         ....                    2014/02/15 10:00AM
2014/02/15    123         ....                    2014/02/15 11:00AM
2014/02/15    123         ....                    2014/02/15 02:13PM
2014/02/15    234         ....                    2014/02/20 01:00PM
2014/02/15    234         ....                    2014/02/20 09:00PM
   ...
2014/02/16    123         ....                    2014/02/20 08:15PM
   ...

然后,我有一个具有相同项目编号的表,用于存储其他信息,但它在整个月内保持静态,因此它看起来如下:

TBLMonthly:

Date:          ItemNo:    CharA:   .... CharK:
2014/01/31     123        ....          
2014/01/31     234        ....          
2013/12/31     123        ....          
2013/12/31     234        ....          
  ...

现在,我需要获得每个零件编号和对于每个日期,每日表格中提供的最新信息,以及某些特征,如果它们不存在,则从月表中获取它们。

我的SQL查询如下所示:

WITH All_Data AS
(
  SELECT 
    ROW_NUMBER() OVER(PARTITION BY A.Date, A.ItemNo ORDER BY A.Time_Stamp) AS RN,
    A.Date, A.ItemNo, 
    NVL(A.CharA, B.CharA),
    B.CharB,
    ... whatever other characteristics ...

  FROM 
    TBLDaily A,
    TBLMonthly B,

  WHERE
    A.ItemNo = B.ItemNo
  AND
    A.Date BETWEEN To_Date('2012-12-31', 'yyyy-MM-dd') AND To_Date('2014-02-24', 'yyyy-MM-dd') 
  AND
    B.Date = (SELECT max(Date) FROM TBLMonthly WHERE Date <= A.Date)
)

SELECT * 
FROM All_Data 
WHERE RN = 1
ORDER BY Date, ItemNo

现在,这个查询需要很长时间非常完成(我从昨天下午开始运行它并且今天早上仍在执行查询)。我知道,这是一个非常大的数据集,但我已经大大加快了查询更大的数据集。我猜测这是由于:

  1. PARTITION BY
  2. 连续B.Date = (SELECT max(Date) FROM TBLMonthly WHERE Date <= A.Date)
  3. 但是我不确定,更糟糕的是,我不知道如何修复它以提高效率而不需要这么长时间。

    非常感谢任何想法/帮助!!

2 个答案:

答案 0 :(得分:2)

使用这种方法也许您的查询更容易,更快捷:

with t AS
(SELECT DISTINCT LAST_VALUE(CharA) OVER (PARTITION BY Date, ItemNo ORDER BY Time_Stamp) as CharA,
   MAX(Time_Stamp) OVER (PARTITION BY Date, ItemNo) as Time_Stamp
FROM TBLDaily)
SELECT *
FROM t
   JOIN TBLMonthly m ON m.ItemNo = d.ItemNo and t.Time_Stamp = m.Time_Stamp

答案 1 :(得分:1)

也许您可以在每日表上创建虚拟列。应该是这样的:

CREATE OR REPLACE FUNCTION Is_latest(V_item IN NUMBER, V_MONTH IN DATE, V_time_stamp IN DATE) RETURN DATE IS
    last_ts DATE;
BEGIN
    SELECT MAX(time_stamp)
    INTO last_ts
    FROM TBLDaily
    WHERE ItemNo = V_item
        AND DATE = V_MONTH;
    IF last_ts = V_time_stamp THEN
        RETURN trunc(last_ts, 'mm')
    ELSE
        RETURN NULL;
    END IF;
END;

ALTER TABLE TBLDaily ADD month_of_TS GENERATED ALWAYS AS (Is_latest(ItemNo, Date, time_stamp));

CREATE INDEX IND_XXX on TBLDaily (ItemNo, month_of_TS);

Select *
from TBLDaily d
   JOIN TBLMonthly m ON m.ItemNo = d.ItemNo and m.Date = d.month_of_TS