使用三重自联接执行缓慢的SQL查询

时间:2010-11-03 14:03:57

标签: sql sql-server performance tsql

我有一个带有下表的传统数据库(注意:没有主键)

它为每个住宿“单位”和日期以及该日期的价格定义每条记录。

CREATE TABLE [single_date_availability](
    [accommodation_id] [int],
    [accommodation_unit_id] [int],
    [arrival_date] [datetime],
    [price] [decimal](18, 0),
    [offer_discount] [decimal](18, 0),
    [num_pax] [int],
    [rooms_remaining] [int],
    [eta_available] [int],
    [date_correct] [datetime],
    [max_occupancy] [int],
    [max_adults] [int],
    [min_stay_nights] [int],
    [max_stay_nights] [int],
    [nights_remaining_count] [numeric](2, 0)
) ON [PRIMARY]

该表包含大约16,500条记录。

但我需要以完全不同的格式将数据相乘,例如:

  • 住宿
  • 日期
  • 持续时间
  • 总价

每个到达日期的最长持续时间。

我正在使用以下查询来实现此目的:

SELECT
    MIN(units.MaxAccommodationAvailabilityPax) AS MaxAccommodationAvailabilityPax,
    MIN(units.MaxAccommodationAvailabilityAdults) AS MaxAccommodationAvailabilityAdults,
    StartDate AS DepartureDate,
    EndDate AS ReturnDate,
    DATEDIFF(DAY, StartDate, EndDate) AS Duration,
    MIN(units.accommodation_id) AS AccommodationID, 
    x.accommodation_unit_id AS AccommodationUnitID,
    SUM(Price) AS Price,
    MAX(num_pax) AS Occupancy,
    SUM(offer_discount) AS OfferSaving,
    MIN(date_correct) AS DateTimeCorrect,
    MIN(rooms_remaining) AS RoomsRemaining,
    MIN(CONVERT(int, dbo.IsGreaterThan(ISNULL(eta_available, 0)+ISNULL(nights_remaining_count, 0), 0))) AS EtaAvailable
FROM single_date_availability fp
INNER JOIN (
    /* This gets max availability for the whole accommodation on the arrival date */
    SELECT accommodation_id, arrival_date,
        CASE EtaAvailable WHEN 1 THEN 99 ELSE MaxAccommodationAvailabilityPax END AS MaxAccommodationAvailabilityPax,
        CASE EtaAvailable WHEN 1 THEN 99 ELSE MaxAccommodationAvailabilityAdults END AS MaxAccommodationAvailabilityAdults
    FROM (SELECT accommodation_id, arrival_date, SUM(MaximumOccupancy) MaxAccommodationAvailabilityPax, SUM(MaximumAdults) MaxAccommodationAvailabilityAdults,
            CONVERT(int, WebData.dbo.IsGreaterThan(SUM(EtaAvailable), -1)) AS EtaAvailable                 
            FROM (SELECT accommodation_id, arrival_date, MIN(rooms_remaining*max_occupancy) as MaximumOccupancy,
                    MIN(rooms_remaining*max_adults) as MaximumAdults, MIN(ISNULL(eta_available, 0) + ISNULL(nights_remaining_count, 0) - 1) as EtaAvailable
                    FROM single_date_availability
                    GROUP BY accommodation_id, accommodation_unit_id, arrival_date) a 
            GROUP BY accommodation_id, arrival_date) b
) units ON fp.accommodation_id = units.accommodation_id AND fp.arrival_date = units.arrival_date
INNER JOIN (
    /* This gets every combination of StartDate and EndDate for each Unit/Occupancy */
    SELECT DISTINCT a.accommodation_unit_id, StartDate = a.arrival_date,
        EndDate = b.arrival_date+1, Duration = DATEDIFF(DAY, a.arrival_date, b.arrival_date)+1
        FROM single_date_availability AS a
        INNER JOIN (SELECT accommodation_unit_id, arrival_date FROM single_date_availability) AS b
        ON a.accommodation_unit_id = b.accommodation_unit_id
            AND DATEDIFF(DAY, a.arrival_date, b.arrival_date)+1 >= a.min_stay_nights
            AND DATEDIFF(DAY, a.arrival_date, b.arrival_date)+1 <= (CASE a.max_stay_nights WHEN 0 THEN 28 ELSE a.max_stay_nights END)
) x ON fp.accommodation_unit_id = x.accommodation_unit_id AND fp.arrival_date >= x.StartDate AND fp.arrival_date < x.EndDate
GROUP BY x.accommodation_unit_id, StartDate, EndDate
/* This ensures that all dates between StartDate and EndDate are actually available */
HAVING COUNT(*) = DATEDIFF(DAY, StartDate, EndDate)

这有效,给了我大约413,000条记录。这个查询的结果我用来更新另一个表。

但是查询执行得非常糟糕,正如您可能期望的那样,有很多自联接。在本地运行大约需要15秒,但在我们的测试服务器上需要1:30分钟,在我们的实时SQL服务器上需要超过30秒;并且在所有情况下,它在执行更大的连接时最大化CPU。

没有其他进程同时访问该表,可以假设。

我真的不介意查询的长度,就像对CPU的需求一样,这会导致其他查询同时尝试访问其他数据库/表的问题。

我已通过查询优化器运行查询,并遵循索引和统计信息的所有建议。

任何有助于提高此查询速度或至少减少CPU密集度的帮助都将非常受欢迎。如果它需要分解成不同的阶段,那是可以接受的。

说实话,速度并不是那么重要,因为它是在一个没有被其他进程触及的表上执行的批量操作。

我并不是特别关注这个结构有多糟糕和不规范化的评论......我已经知道了: - )

3 个答案:

答案 0 :(得分:16)

答案 1 :(得分:2)

这很可能无法解决您的所有问题,但请尝试切换

AND DATEDIFF(DAY , a.arrival_date , b.arrival_date) + 1 >= a.min_stay_nights
AND DATEDIFF(DAY , a.arrival_date , b.arrival_date) + 1 <= (CASE a.max_stay_nights WHEN 0 THEN 28 ELSE a.max_stay_nights END)

and a.min_stay_nights<=DATEDIFF(DAY , a.arrival_date , b.arrival_date)
and (CASE a.max_stay_nights WHEN 0 THEN 28 ELSE a.max_stay_nights END)>=DATEDIFF(DAY , a.arrival_date , b.arrival_date) + 1

原因是,据我所知,sql server不喜欢=符号左侧的函数where where子句

答案 2 :(得分:1)

既然你说你已经运行了查询优化器,那么我只能假设你的所有索引都是正确的。我的下一个方法是在应用程序中进行连接。那是什么意思?而不是让DB做10万行的连接。在您的应用程序中获取所有这些,然后循环和逻辑来执行您在sql中所做的事情。

原因是许多fe应用程序,如facebook,yahoo,aol皱眉加入。加入并不是最好的事情,除非你知道它会很快。在这种情况下,您可能希望在应用程序中加入,然后将其缓存以备将来使用。