我正在重新设计一些遗留代码并遇到了这个计算。想知道这里是否有人可以指出做这样的事情的理由是什么?作者不在公司,也没有文件。
背景是:如果员工类型被定义为最低粮食,则首先在该水平计算加权平均值,并通过再次重新计算加权平均值来累计更高粮食。
employee department employee_type salary weight location
A X F 1000 3.15 boston
B X P 300 1.27 NY
C Y F 2000 3.38 Tampa
D Y P 1.12 LA
E X F 3000 3.38 SFO
用于计算部门平均工资的查询:
select department, sum(case when avg_salary is not null then
avg_salary*bonus else 0 end)/sum(case when avg_salary is not null then
bonus else 1 end)
from
(select employee,department,location,employee_type
,sum(weight) as bonus
,sum(case when salary is not null then salary*weight else 0 end)/sum(case when salary is not null then weight else 1 end) as avg_salary
from employee
group by employee,department,location,employee_type
)x
group by department
输出:
X 1752.69230769231
Y 1502.22222222222
如果我们在最低粮食时汇总,然后计算更高粮食的平均工资,我们就会得到不同的价值。
所以我想问题是,这是一种正确的方法,这种方法背后的理由是什么 - 只是考虑到缺失值?
答案 0 :(得分:1)
这是一个简单的加权平均值。 (在Excel中考虑SumProduct)
您可能会注意到分母中的NULLIF()。这是为了避免可怕的被零除。我相信你知道,但是你可以Group By
任何字段组合(从原子级一直到上)。
示例强>
Declare @YourTable Table ([employee] varchar(50),[department] varchar(50),[employee_type] varchar(50),[salary] money,[weight] money,[location] varchar(50))
Insert Into @YourTable Values
('A','X','F',1000,3.15,'boston')
,('B','X','P',300,1.27,'NY')
,('C','Y','F',2000,3.38,'Tampa')
,('D','Y','P',null,1.12,'LA')
,('E','X','F',3000,3.38,'SFO')
Select Department
,WeigtedAvg = sum(Salary*Weight)/NullIf(sum(Weight),0)
From @YourTable
Group By Department
<强>返回强>
Department WeigtedAvg
X 1752.6923
Y 1502.2222
只是为了好玩
Select Department
,WeigtedAvgBonus = sum(Salary*Weight)/NullIf(sum(Weight),0)
,WeigtedAvgRate = sum(Salary*Weight)/NullIf(sum(Salary),0)
From @YourTable
Group By Department
<强>返回强>
Department WeigtedAvgBonus WeigtedAvgRate
X 1752.6923 3.1793
Y 1502.2222 3.38 -- Notice this matches the only non-null observation in Y