我的数据库包含两个名为DomesticSalesOrders
和InternationalSalesOrders
的表。两个表都包含超过1亿行。每个表都有一个名为SalesOrderId
的主键列。两个表中的数据彼此不同。
商业用户需要一份报告,其中包含有关全球销售总数和总销售额的汇总信息。我需要确保我的查询在尽可能短的时间内执行。我应该使用哪个查询?
选项1:
SELECT
COUNT(*) AS NumberOfSales,
SUM( SalesAmount ) AS TotalSalesAmount
FROM
(
SELECT
SalesOrderId,
SalesAmount
FROM
DomesticSalesOrders
UNION ALL
SELECT
SalesOrderId,
SalesAmount
FROM
InternationalSalesOrders
) AS p
选项2:
SELECT
COUNT(*) AS NumberOfSales,
SUM( SalesAmount ) AS TotalSalesAmount
FROM
DomesticSalesOrders
UNION ALL
SELECT
COUNT(*) AS NumberOfSales,
SUM( SalesAmount ) AS TotalSalesAmount
FROM
InternationalSalesOrders
我认为两者都是正确的,但我无法理解有什么不同?谢谢
答案 0 :(得分:0)
第一个答案是正确的,因为它只返回一行:
NumberOfSales | TotalSalesAmount
---------------------+----------
COUNT( of subquery ) | SUM( of subquery )
而第二个答案返回两行:
NumberOfSales | TotalSalesAmount
-----------------------+----------
COUNT( of subquery 1 ) | SUM( of subquery 1 )
COUNT( of subquery 2 ) | SUM( of subquery 2 )
更好的答案是使用选项2的拆分子查询来利用并行化,然后最终计算第二级聚合:
SELECT
SUM( [inner].NumberOfSales ) AS NumberOfSales,
SUM( [inner].SalesAmount ) AS TotalSalesAmount
FROM
(
SELECT
COUNT(*) AS NumberOfSales,
SUM( SalesAmount ) AS TotalSalesAmount
FROM
DomesticSalesOrders
UNION ALL
SELECT
COUNT(*) AS NumberOfSales,
SUM( SalesAmount ) AS TotalSalesAmount
FROM
InternationalSalesOrders
) AS [inner]
当然,您需要了解RDBMS如何执行它并比较执行计划。足够聪明的引擎会为您的问题生成与查询中的选项1相同的执行计划,但您不能总是做出这样的假设。