Question

背景资料：

有一张表＆＃34; ProductCosts＆＃34;。第一个样本数据集显示正确插入的数据。数据通过excel输入并由ETL过程摄取。该表显示了不同的成本。成本＆＃34; 4_Cost＆＃34;是最新的，然后是＆＃34; 3_Costs＆＃34;等等。

在这个案例＆＃34; 3-Costs＆＃34;是最近给定的费用：

Category                Product ISOMonth    1_Costs     2_Costs     3_Costs     4_Costs     
----------------------------------------------------------------------------------------
ProductCategory1        Stuff   2017-10     40,000.00   40,000.00   50,000.00   NULL    
ProductCategory1        Stuff   2017-10     10,000.00   10,000.00   00.00       NULL    
ProductCategory1        Stuff   2017-10     10,000.00   10,000.00   00.00       NULL

你会在第二行和第三行看到10,000.00来自＆＃34; 2_Costs＆＃34;被＆＃34; 3_Costs＆＃34;替换为00.00。要识别CurrentCosts，应用以下简单逻辑（参见COALESCE）：

SELECT Category
    . Product
    . ISOMonth
    . COALESCE([4_Costs].[3_Costs]. [2_Costs]. [1_Costs]) AS CurrentRRCosts 
FROM [ProductCosts]

正确的结果：

Category                Product ISOMonth    CurrentCosts
-----------------------------------------------------------
ProductCategory1        Stuff   2017-10     50,000.00
ProductCategory1        Stuff   2017-10     00.00
ProductCategory1        Stuff   2017-10     00.00

最后将CurrentCost总计为50,000.00 如果Inputdata是正确的，这很有效。

数据错误：

Category                Product ISOMonth    1_Costs     2_Costs     3_Costs     4_Costs     CurrentCosts
---------------------------------------------------------------------------------------------------------
ProductCategory1        Stuff   2017-10     40,000.00   40,000.00   50,000.00   NULL        50,000.00
ProductCategory1        Stuff   2017-10     10,000.00   10,000.00   NULL        NULL        10,000.00
ProductCategory1        Stuff   2017-10     10,000.00   10,000.00   NULL        NULL        10,000.00

在这种情况下，用户忘记输入列＆＃34; 3_Costs＆＃34;的第二行和第三行中的00.00。这导致CurrentCosts列中的错误结果：

Category                Product ISOMonth    CurrentCosts
--------------------------------------------------------
ProductCategory1        Stuff   2017-10     50,000.00
ProductCategory1        Stuff   2017-10     10,000.00
ProductCategory1        Stuff   2017-10     10,000.00

最后将CurrentCost总计为70,000.00 ，这是一个错误的结果，因为用户忘记用00.00

覆盖prevoius 10,000.00

断言： 如果列的一个值类似于＆＃34; 3_Costs＆＃34;是非空（这里是例如50,000.00）按类别，产品和月份，其他值不应为NULL。

错误的数据示例： 查看数据集＆＃34;错误数据＆＃34;。如果有＆＃34; 3_Costs＆＃34;在第一行中，seconde和第三行中也必须有一个值。

返回标记的SQL查询，例如＆＃34; has_incomplete_cost_column＆＃34;没关系。然后我会知道数据不一致。

决定因素：我必须保持存在的数据模型和概念因为它已经以这种方式实现。输入数据由Excel工作表提供，因此它不是建立捕获这些错误的用户界面。

Answer 1

分析和案例或子查询如何获得每列总数，然后使用案例/每次使用相同的列？

Demo:

根本问题是你需要在列的总和上发生合并，而不是单个行;然后只显示行值而不是总和。

With ProductCosts(Category,Product, ISOMonth, [1_Costs], [2_Costs], [3_Costs], [4_Costs]) as (
SELECT 'ProductCategory1',        'Stuff',   '2017-10',     40000.00,   40000.00,   50000.00,   cast(NULL as numeric(10,2)) UNION ALL
SELECT 'ProductCategory1',        'Stuff',   '2017-10',     10000.00,   10000.00,   NULL ,     cast(NULL as numeric(10,2)) UNION ALL
SELECT 'ProductCategory1',        'Stuff',   '2017-10',     10000.00,   10000.00,   NULL,       cast(NULL as numeric(10,2)) UNION ALL
SELECT 'ProductCategory1',        'Stuff',   '2017-10',     NULL,        NULL,        NULL,        cast(NULL as numeric(10,2)))


Select Category, Product, ISOMonth, Case when sum([4_costs]) over (partition by Category, Product, ISOMonth) > 0 then [4_costs]
     when sum([3_Costs]) over (partition by Category, Product, ISOMonth)> 0 then [3_Costs]
     when sum([2_costs]) over (partition by Category, Product, ISOMonth)> 0 then [2_costs]
     when sum([1_Costs]) over (partition by Category, Product, ISOMonth)> 0 then [1_costs]
end as currentprice
from productCosts A

给予我们（采用顶部或底部方法）

+----+------------------+---------+----------+--------------+
|    |     Category     | Product | ISOMonth | currentprice |
+----+------------------+---------+----------+--------------+
|  1 | ProductCategory1 | Stuff   | 2017-10  | 50000,00     |
|  2 | ProductCategory1 | Stuff   | 2017-10  | NULL         |
|  3 | ProductCategory1 | Stuff   | 2017-10  | NULL         |
|  4 | ProductCategory1 | Stuff   | 2017-10  | NULL         |
+----+------------------+---------+----------+--------------+

很少注意到：

不是用数字开始列的忠实粉丝，但那是我的挂断。
通过使用分析，我们识别出第一列具有值并始终使用它的值。
分析基本上必须为每个记录/行运行，这可能比计算一次或为每列计算一次要慢，将结果放入变量并在案例中使用变量
我对每个分析的分区都不肯定，因此您可能需要进行调整。
这允许缺少的0.00在任何行（包括第一行）
要说有人在4_costs中没有提供0.00，然后所有总数将基于4_Costs中的0.00？您无法控制此类人为错误。但是你可以检查并报告一个列的总数是否为0表示用户检查输入为＆＃34;警告＆＃34;

替代方法;我不确定分析重复或子查询是否更快，无需测试。我认为子查询我相信它们只会发生一次而分析必须为每一行运行;但也许引擎知道并相应地进行了优化。

Select PC.Category, PC.Product, PC.ISOMonth, Case when D.[4_costs] > 0 then PC.[4_costs]
     when C.[3_Costs]> 0 then PC.[3_Costs]
     when B.[2_Costs]> 0 then PC.[2_costs]
     when A.[1_Costs]> 0 then PC.[1_costs]
end as currentprice
from productCosts PC
INNER join (Select sum([4_costs]) [4_costs], Category, product, ISOMonth from ProductCosts GROUP BY  Category, product, ISOMonth ) D
  on D.Category = PC.Category
 and D.Product = PC.Product
 and D.ISOMonth = PC.ISOMonth
INNER join (Select sum([3_costs]) [3_costs], Category, product, ISOMonth from ProductCosts Group by Category, product, ISOMonth) C
  on C.Category = PC.Category
 and C.Product = PC.Product
 and C.ISOMonth = PC.ISOMonth
INNER join (Select sum([2_costs]) [2_costs], Category, product, ISOMonth from ProductCosts Group by Category, product, ISOMonth ) B
  on B.Category = PC.Category
 and B.Product = PC.Product
 and B.ISOMonth = PC.ISOMonth
INNER join (Select sum([1_costs]) [1_costs], Category, product, ISOMonth from ProductCosts Group by Category, product, ISOMonth ) A
  on A.Category = PC.Category
 and A.Product = PC.Product
 and A.ISOMonth = PC.ISOMonth

通过识别0＆amp;的错误使用来确保使用SQL正确输入数据。 NULL（COALESCE Logic）

1 个答案: