当有两个可能的表保存详细信息时,选择数据的最佳方法是什么?

时间:2013-10-09 11:14:01

标签: sql sql-server

我有一个数据库,其中包含产品信息,特别是按材料包装的产品包装重量。并非每种产品都具有实际的包装重量,因此有一个系统可以通过将这些产品组合在一起来确定这些产品的平均重量。

例如,如果有一个新产品“Can of beans”,那么这可能会被放入一个名为“Cans”的组中。 “罐头”组中的其他产品将具有包装重量,因此需要计算以确定该组的平均重量(按材料)。

当呈现权重数据时,如果它们可用,我想使用实际权重,如果不是,则使用组权重回退。问题是产品与实际权重/组权重之间的关系是一对多,因此如果产品同时具有实际权重和组权重,则有可能返回多行重复数据。

在实时系统中,大约有1000万种产品和300多万个重量,所以我需要一个表现良好的解决方案。

我目前的方法是只选择所有行,然后选择权重的AVG,但这似乎是一个相当“笨重”的解决方案。有更好的方法吗?

我有一个(相当长的)使用组合数据的例子:

DECLARE @Product TABLE (
    ProductId INT,
    GroupId INT,
    ProductName VARCHAR(50),
    PRIMARY KEY (ProductId));
DECLARE @Group TABLE (
    GroupId INT,
    GroupName VARCHAR(50),
    PRIMARY KEY (GroupId));
DECLARE @Material TABLE (
    MaterialId INT,
    MaterialName VARCHAR(50),
    PRIMARY KEY (MaterialId));
DECLARE @ProductWeight TABLE (
    ProductId INT,
    MaterialId INT,
    [Weight] NUMERIC(19,2),
    PRIMARY KEY (ProductId, MaterialId));
DECLARE @GroupWeight TABLE (
    GroupId INT,
    MaterialId INT,
    [Weight] NUMERIC(19,2),
    PRIMARY KEY (GroupId, MaterialId));

--Materials, only three for this example
INSERT INTO @Material VALUES (1, 'Paper');
INSERT INTO @Material VALUES (2, 'Steel');
INSERT INTO @Material VALUES (3, 'Glass');

--Two groups, one for cans and one for bottles
INSERT INTO @Group VALUES (1, 'Cans');
INSERT INTO @Group VALUES (2, 'Bottles');

--Five products, two "cans" and three "bottles"
INSERT INTO @Product VALUES (1, 1, 'Can of soup');
INSERT INTO @Product VALUES (2, 1, 'Can of beans');
INSERT INTO @Product VALUES (3, 2, 'Bottle of beer');
INSERT INTO @Product VALUES (4, 2, 'Bottle of wine');
INSERT INTO @Product VALUES (5, 2, 'Bottle of sauce');

--Three products have actual weights
INSERT INTO @ProductWeight VALUES (1, 1, 5.2);
INSERT INTO @ProductWeight VALUES (1, 2, 23.1);
INSERT INTO @ProductWeight VALUES (3, 1, 4.6);
INSERT INTO @ProductWeight VALUES (3, 2, 2.4);
INSERT INTO @ProductWeight VALUES (3, 3, 185.9);
INSERT INTO @ProductWeight VALUES (4, 1, 5.1);
INSERT INTO @ProductWeight VALUES (4, 2, 2.6);
INSERT INTO @ProductWeight VALUES (4, 3, 650.4);

--Calculate the group weights
INSERT INTO @GroupWeight 
SELECT p.GroupId, pw.MaterialId, AVG(pw.[Weight]) 
FROM @ProductWeight pw INNER JOIN @Product p ON p.ProductId = pw.ProductId
GROUP BY p.GroupId, pw.MaterialId;

--Now display the product information, use the actual weights where available and the group weights otherwise
SELECT
    p.ProductName,
    m.MaterialName,
    CASE WHEN pw.[Weight] IS NOT NULL THEN 'Product' ELSE 'Group' END AS WeightSource,
    AVG(COALESCE(pw.[Weight], gw.[Weight])) AS [Weight]
FROM
    @Product p
    LEFT JOIN @ProductWeight pw ON pw.ProductId = p.ProductId
    LEFT JOIN @GroupWeight gw ON gw.GroupId = p.GroupId
    LEFT JOIN @Material m ON m.MaterialId = COALESCE(pw.MaterialId, gw.MaterialId)
GROUP BY
    p.ProductName,
    m.MaterialName,
    CASE WHEN pw.[Weight] IS NOT NULL THEN 'Product' ELSE 'Group' END;

当它运行时,它将以我想要的格式返回数据,包括重量来源,即它是实际重量还是组权重:

ProductName     MaterialName    WeightSource    Weight
Bottle of beer  Glass           Product         185.900000
Bottle of beer  Paper           Product         4.600000
Bottle of beer  Steel           Product         2.400000
Bottle of sauce Glass           Group           418.150000
Bottle of sauce Paper           Group           4.850000
Bottle of sauce Steel           Group           2.500000
Bottle of wine  Glass           Product         650.400000
Bottle of wine  Paper           Product         5.100000
Bottle of wine  Steel           Product         2.600000
Can of beans    Paper           Group           5.200000
Can of beans    Steel           Group           23.100000
Can of soup     Paper           Product         5.200000
Can of soup     Steel           Product         23.100000

但我不禁觉得必须有更有效的方法来做到这一点?

编辑 - 我开始使用UNION ALL,也许我错过了一些东西,因为这是我能想到的最好的东西?

WITH RawData AS (
SELECT
    p.ProductName,
    m.MaterialName,
    'Product' AS WeightSource,
    pw.[Weight]
FROM
    @Product p
    INNER JOIN @ProductWeight pw ON pw.ProductId = p.ProductId
    INNER JOIN @Material m ON m.MaterialId = pw.MaterialId
UNION ALL
SELECT
    p.ProductName,
    m.MaterialName,
    'Group' AS WeightSource,
    gw.[Weight]
FROM
    @Product p
    INNER JOIN @GroupWeight gw ON gw.GroupId = p.GroupId
    INNER JOIN @Material m ON m.MaterialId = gw.MaterialId),
RankedWeightSource AS (
SELECT
    ProductName,
    WeightSource,
    ROW_NUMBER() OVER (PARTITION BY ProductName ORDER BY WeightSource DESC) AS RowRank
FROM
    RawData
GROUP BY 
    ProductName,
    WeightSource),
BestWeightSource AS (
SELECT
    ProductName,
    WeightSource
FROM
    RankedWeightSource
WHERE
    RowRank = 1)
SELECT 
    * 
FROM 
    RawData rd
    INNER JOIN BestWeightSource bws ON bws.ProductName = rd.ProductName AND bws.WeightSource = rd.WeightSource;

1 个答案:

答案 0 :(得分:1)

我之前在类似情况下所做的是引入一个包含所有可能值的原始查询,以及值的优先级;然后使用ROW_NUMBER外部查询来获得具有最高优先级的值。

我将使用您的(优秀)示例数据,并且在插入@GroupWeight之后一切都会进行。

这是我们的原始数据:

-- the product weights (use INNER JOIN to only find 
--   the products with their own weights)
SELECT
    p.ProductId,
    p.ProductName,
    m.MaterialId,
    m.MaterialName,
    pw.Weight,
    'Product' WeightSource,
    20 Precedence
FROM
    @Product p
    INNER JOIN @ProductWeight pw ON pw.ProductId = p.ProductId
    INNER JOIN @Material m ON m.MaterialId = pw.MaterialId
UNION ALL
-- the group weight
SELECT
    p.ProductId,
    p.ProductName,
    m.MaterialId,
    m.MaterialName,
    gw.Weight,
    'Group' WeightSource,
    10 Precedence
FROM
    @Product p
    INNER JOIN @GroupWeight gw on gw.GroupId = p.GroupId
    INNER JOIN @Material m ON m.MaterialId = gw.MaterialId

对于具有特定重量的每种产品材料,这将返回一行,每种产品材料加一行。每行表示它是产品重量还是组重量。

然后我们可以对行进行编号,按优先顺序排序:

-- assume the above is in a CTE named AllWeights
SELECT 
    *,
    ROW_NUMBER() OVER (PARTITION BY ProductId, MaterialId 
                       ORDER BY Precedence DESC) rn
FROM 
    AllWeights

这为我们提供了相同的数据,并附加指示给定产品材料的哪一行是相关的,所以最后我们可以得到:

-- assume the above is in a CTE named RowNumbered
SELECT
    ProductName,
    MaterialName,
    WeightSource,
    Weight
FROM
    RowNumbered
WHERE
    rn = 1
;

我们已经完成了。


全部放在一起:

;WITH AllWeights AS (
-- the product weights (use INNER JOIN to only find 
--   the products with their own weights)
SELECT
    p.ProductId,
    p.ProductName,
    m.MaterialId,
    m.MaterialName,
    pw.Weight,
    'Product' WeightSource,
    20 Precedence
FROM
    @Product p
    INNER JOIN @ProductWeight pw ON pw.ProductId = p.ProductId
    INNER JOIN @Material m ON m.MaterialId = pw.MaterialId
UNION ALL
-- the group weight
SELECT
    p.ProductId,
    p.ProductName,
    m.MaterialId,
    m.MaterialName,
    gw.Weight,
    'Group' WeightSource,
    10 Precedence
FROM
    @Product p
    INNER JOIN @GroupWeight gw on gw.GroupId = p.GroupId
    INNER JOIN @Material m ON m.MaterialId = gw.MaterialId
),
RowNumbered AS (
SELECT 
    *,
    ROW_NUMBER() OVER (PARTITION BY ProductId, MaterialId 
                       ORDER BY Precedence DESC) rn
FROM 
    AllWeights
)
SELECT
    ProductName,
    MaterialName,
    WeightSource,
    Weight
FROM
    RowNumbered
WHERE
    rn = 1
;

输出:

ProductName          MaterialName WeightSource Weight
-------------------- ------------ ------------ ------------
Can of soup          Paper        Product      5.20
Can of soup          Steel        Product      23.10
Can of beans         Paper        Group        5.20
Can of beans         Steel        Group        23.10
Bottle of beer       Paper        Product      4.60
Bottle of beer       Steel        Product      2.40
Bottle of beer       Glass        Product      185.90
Bottle of wine       Paper        Product      5.10
Bottle of wine       Steel        Product      2.60
Bottle of wine       Glass        Product      650.40
Bottle of sauce      Paper        Group        4.85
Bottle of sauce      Steel        Group        2.50
Bottle of sauce      Glass        Group        418.15

除了订单以外,我认为它与你的相同。

当然,你必须亲自检查一下表现。