我有一个数据库,其中包含产品信息,特别是按材料包装的产品包装重量。并非每种产品都具有实际的包装重量,因此有一个系统可以通过将这些产品组合在一起来确定这些产品的平均重量。
例如,如果有一个新产品“Can of beans”,那么这可能会被放入一个名为“Cans”的组中。 “罐头”组中的其他产品将具有包装重量,因此需要计算以确定该组的平均重量(按材料)。
当呈现权重数据时,如果它们可用,我想使用实际权重,如果不是,则使用组权重回退。问题是产品与实际权重/组权重之间的关系是一对多,因此如果产品同时具有实际权重和组权重,则有可能返回多行重复数据。
在实时系统中,大约有1000万种产品和300多万个重量,所以我需要一个表现良好的解决方案。
我目前的方法是只选择所有行,然后选择权重的AVG,但这似乎是一个相当“笨重”的解决方案。有更好的方法吗?
我有一个(相当长的)使用组合数据的例子:
DECLARE @Product TABLE (
ProductId INT,
GroupId INT,
ProductName VARCHAR(50),
PRIMARY KEY (ProductId));
DECLARE @Group TABLE (
GroupId INT,
GroupName VARCHAR(50),
PRIMARY KEY (GroupId));
DECLARE @Material TABLE (
MaterialId INT,
MaterialName VARCHAR(50),
PRIMARY KEY (MaterialId));
DECLARE @ProductWeight TABLE (
ProductId INT,
MaterialId INT,
[Weight] NUMERIC(19,2),
PRIMARY KEY (ProductId, MaterialId));
DECLARE @GroupWeight TABLE (
GroupId INT,
MaterialId INT,
[Weight] NUMERIC(19,2),
PRIMARY KEY (GroupId, MaterialId));
--Materials, only three for this example
INSERT INTO @Material VALUES (1, 'Paper');
INSERT INTO @Material VALUES (2, 'Steel');
INSERT INTO @Material VALUES (3, 'Glass');
--Two groups, one for cans and one for bottles
INSERT INTO @Group VALUES (1, 'Cans');
INSERT INTO @Group VALUES (2, 'Bottles');
--Five products, two "cans" and three "bottles"
INSERT INTO @Product VALUES (1, 1, 'Can of soup');
INSERT INTO @Product VALUES (2, 1, 'Can of beans');
INSERT INTO @Product VALUES (3, 2, 'Bottle of beer');
INSERT INTO @Product VALUES (4, 2, 'Bottle of wine');
INSERT INTO @Product VALUES (5, 2, 'Bottle of sauce');
--Three products have actual weights
INSERT INTO @ProductWeight VALUES (1, 1, 5.2);
INSERT INTO @ProductWeight VALUES (1, 2, 23.1);
INSERT INTO @ProductWeight VALUES (3, 1, 4.6);
INSERT INTO @ProductWeight VALUES (3, 2, 2.4);
INSERT INTO @ProductWeight VALUES (3, 3, 185.9);
INSERT INTO @ProductWeight VALUES (4, 1, 5.1);
INSERT INTO @ProductWeight VALUES (4, 2, 2.6);
INSERT INTO @ProductWeight VALUES (4, 3, 650.4);
--Calculate the group weights
INSERT INTO @GroupWeight
SELECT p.GroupId, pw.MaterialId, AVG(pw.[Weight])
FROM @ProductWeight pw INNER JOIN @Product p ON p.ProductId = pw.ProductId
GROUP BY p.GroupId, pw.MaterialId;
--Now display the product information, use the actual weights where available and the group weights otherwise
SELECT
p.ProductName,
m.MaterialName,
CASE WHEN pw.[Weight] IS NOT NULL THEN 'Product' ELSE 'Group' END AS WeightSource,
AVG(COALESCE(pw.[Weight], gw.[Weight])) AS [Weight]
FROM
@Product p
LEFT JOIN @ProductWeight pw ON pw.ProductId = p.ProductId
LEFT JOIN @GroupWeight gw ON gw.GroupId = p.GroupId
LEFT JOIN @Material m ON m.MaterialId = COALESCE(pw.MaterialId, gw.MaterialId)
GROUP BY
p.ProductName,
m.MaterialName,
CASE WHEN pw.[Weight] IS NOT NULL THEN 'Product' ELSE 'Group' END;
当它运行时,它将以我想要的格式返回数据,包括重量来源,即它是实际重量还是组权重:
ProductName MaterialName WeightSource Weight
Bottle of beer Glass Product 185.900000
Bottle of beer Paper Product 4.600000
Bottle of beer Steel Product 2.400000
Bottle of sauce Glass Group 418.150000
Bottle of sauce Paper Group 4.850000
Bottle of sauce Steel Group 2.500000
Bottle of wine Glass Product 650.400000
Bottle of wine Paper Product 5.100000
Bottle of wine Steel Product 2.600000
Can of beans Paper Group 5.200000
Can of beans Steel Group 23.100000
Can of soup Paper Product 5.200000
Can of soup Steel Product 23.100000
但我不禁觉得必须有更有效的方法来做到这一点?
编辑 - 我开始使用UNION ALL,也许我错过了一些东西,因为这是我能想到的最好的东西?
WITH RawData AS (
SELECT
p.ProductName,
m.MaterialName,
'Product' AS WeightSource,
pw.[Weight]
FROM
@Product p
INNER JOIN @ProductWeight pw ON pw.ProductId = p.ProductId
INNER JOIN @Material m ON m.MaterialId = pw.MaterialId
UNION ALL
SELECT
p.ProductName,
m.MaterialName,
'Group' AS WeightSource,
gw.[Weight]
FROM
@Product p
INNER JOIN @GroupWeight gw ON gw.GroupId = p.GroupId
INNER JOIN @Material m ON m.MaterialId = gw.MaterialId),
RankedWeightSource AS (
SELECT
ProductName,
WeightSource,
ROW_NUMBER() OVER (PARTITION BY ProductName ORDER BY WeightSource DESC) AS RowRank
FROM
RawData
GROUP BY
ProductName,
WeightSource),
BestWeightSource AS (
SELECT
ProductName,
WeightSource
FROM
RankedWeightSource
WHERE
RowRank = 1)
SELECT
*
FROM
RawData rd
INNER JOIN BestWeightSource bws ON bws.ProductName = rd.ProductName AND bws.WeightSource = rd.WeightSource;
答案 0 :(得分:1)
我之前在类似情况下所做的是引入一个包含所有可能值的原始查询,以及值的优先级;然后使用ROW_NUMBER
外部查询来获得具有最高优先级的值。
我将使用您的(优秀)示例数据,并且在插入@GroupWeight
之后一切都会进行。
这是我们的原始数据:
-- the product weights (use INNER JOIN to only find
-- the products with their own weights)
SELECT
p.ProductId,
p.ProductName,
m.MaterialId,
m.MaterialName,
pw.Weight,
'Product' WeightSource,
20 Precedence
FROM
@Product p
INNER JOIN @ProductWeight pw ON pw.ProductId = p.ProductId
INNER JOIN @Material m ON m.MaterialId = pw.MaterialId
UNION ALL
-- the group weight
SELECT
p.ProductId,
p.ProductName,
m.MaterialId,
m.MaterialName,
gw.Weight,
'Group' WeightSource,
10 Precedence
FROM
@Product p
INNER JOIN @GroupWeight gw on gw.GroupId = p.GroupId
INNER JOIN @Material m ON m.MaterialId = gw.MaterialId
对于具有特定重量的每种产品材料,这将返回一行,每种产品材料加一行。每行表示它是产品重量还是组重量。
然后我们可以对行进行编号,按优先顺序排序:
-- assume the above is in a CTE named AllWeights
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY ProductId, MaterialId
ORDER BY Precedence DESC) rn
FROM
AllWeights
这为我们提供了相同的数据,并附加指示给定产品材料的哪一行是相关的,所以最后我们可以得到:
-- assume the above is in a CTE named RowNumbered
SELECT
ProductName,
MaterialName,
WeightSource,
Weight
FROM
RowNumbered
WHERE
rn = 1
;
我们已经完成了。
全部放在一起:
;WITH AllWeights AS (
-- the product weights (use INNER JOIN to only find
-- the products with their own weights)
SELECT
p.ProductId,
p.ProductName,
m.MaterialId,
m.MaterialName,
pw.Weight,
'Product' WeightSource,
20 Precedence
FROM
@Product p
INNER JOIN @ProductWeight pw ON pw.ProductId = p.ProductId
INNER JOIN @Material m ON m.MaterialId = pw.MaterialId
UNION ALL
-- the group weight
SELECT
p.ProductId,
p.ProductName,
m.MaterialId,
m.MaterialName,
gw.Weight,
'Group' WeightSource,
10 Precedence
FROM
@Product p
INNER JOIN @GroupWeight gw on gw.GroupId = p.GroupId
INNER JOIN @Material m ON m.MaterialId = gw.MaterialId
),
RowNumbered AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY ProductId, MaterialId
ORDER BY Precedence DESC) rn
FROM
AllWeights
)
SELECT
ProductName,
MaterialName,
WeightSource,
Weight
FROM
RowNumbered
WHERE
rn = 1
;
输出:
ProductName MaterialName WeightSource Weight
-------------------- ------------ ------------ ------------
Can of soup Paper Product 5.20
Can of soup Steel Product 23.10
Can of beans Paper Group 5.20
Can of beans Steel Group 23.10
Bottle of beer Paper Product 4.60
Bottle of beer Steel Product 2.40
Bottle of beer Glass Product 185.90
Bottle of wine Paper Product 5.10
Bottle of wine Steel Product 2.60
Bottle of wine Glass Product 650.40
Bottle of sauce Paper Group 4.85
Bottle of sauce Steel Group 2.50
Bottle of sauce Glass Group 418.15
除了订单以外,我认为它与你的相同。
当然,你必须亲自检查一下表现。