我有一个这样的数据表:
QuestionID UserName UserWeightingForQuestion AnswerGivenForQuestion Metric
1 A 1.50 1 ToBeCalculated
1 B 1.00 2 ToBeCalculated
1 C 1.80 3 ToBeCalculated
1 D 1.20 1 ToBeCalculated
1 E 1.40 2 ToBeCalculated
2 A 1.20 2 ToBeCalculated
2 B 1.20 2 ToBeCalculated
2 C 1.10 4 ToBeCalculated
2 D 1.20 5 ToBeCalculated
...
对于每个问题组,我想在Metric
列下的每个单元格中填入一个如下所示定义的计算值:
Metric_For_User_A_For_QuestionID_X = SUM(Weights_With_The_Answer_Similar_To_What_Is_Given_By_User_A_In_QuestionID_Group = X) / DISTINCT(All_WEeights_In_One_QuestionID_Group = X)
具体来说,
Metric_For_User_A_For_QuestionID_1 = SUM(1.50+1.20)/(1.50+1.00+1.80+1.20+1.40)
Metric_For_User_B_For_QuestionID_1 = SUM(1.00+1.40)/(1.50+1.00+1.80+1.20+1.40)
Metric_For_User_C_For_QuestionID_1 = SUM(1.80)/(1.50+1.00+1.80+1.20+1.40)
Metric_For_User_D_For_QuestionID_1 = SUM(1.50+1.20)/(1.50+1.00+1.80+1.20+1.40)
Metric_For_User_E_For_QuestionID_1 = SUM(1.00+1.40)/(1.50+1.00+1.80+1.20+1.40)
对于QuestionID组= 2,我想重复上述过程。例如,
Metric_For_User_A_For_QuestionID_2 = SUM(1.20+1.20)/(1.20+1.10)
我对SQL很新,我相信可以使用OVER
或某种聚合函数来实现这一点(?)如果在SQL中可以进行这种计算,那么具有SQL专业知识的人可能会建议我是一种实现我想要计算的方法。
原始表有~70m行,我使用的是SQL Server。非常感谢您的建议和答案!
答案 0 :(得分:1)
您可以使用SUM
窗口功能执行此操作。
select t.*,
sum(UserWeightingForQuestion) over(partition by questionID,AnswerGivenForQuestion)
/sum(UserWeightingForQuestion) over(partition by questionID) as metric
from tablename t
sum(UserWeightingForQuestion) over(partition by questionID)
获取每个questionID所有UserWeightingForQuestion的总和
sum(UserWeightingForQuestion) over(partition by questionID,AnswerGivenForQuestion)
总结了每个questionID类似的UserWeightingForQuestion
编辑:要总结分母中每个questionID的不同权重,请使用
select t.*,
sum(UserWeightingForQuestion) over(partition by questionID,AnswerGivenForQuestion)
/(select sum(distinct UserWeightingForQuestion) from tablename where t.questionID=questionID) as metric
from tablename t
答案 1 :(得分:1)
declare @quest table(QuestionID int
, UserName varchar(20)
, UserWeightingForQuestion decimal(10,2)
, AnswerGivenForQuestion int);
insert into @quest values
(1,'A',1.50,1),(1,'B',1.00,2),(1,'C',1.80,3),(1,'D',1.20,1),
(1,'E',1.40,2),(2,'A',1.20,2),(2,'B',1.20,2),(2,'C',1.10,4),(2,'D',1.20,5);
Baicaly你创建了两个分区,一个是QuestionID和AnswerGivenForQuestion,另一个是QuestionID。
WITH CALC AS
(
SELECT Q2.QuestionID, Q2.UserName,
SUM(UserWeightingForQuestion) OVER (PARTITION BY QuestionID, AnswerGivenForQuestion) AS Weight,
(SELECT SUM(DISTINCT Q1.UserWeightingForQuestion)
FROM @quest Q1
WHERE Q1.QuestionID = Q2.QuestionID) AS AllWeights
FROM @quest Q2
)
SELECT QuestionID, UserName, Weight, AllWeights,
CAST(Weight / AllWeights AS DECIMAL(18,2)) as Metric
FROM CALC
ORDER BY QuestionID, UserName;
+------------+----------+--------+------------+--------+
| QuestionID | UserName | Weight | AllWeights | Metric |
+------------+----------+--------+------------+--------+
| 1 | A | 2,70 | 6,90 | 0,39 |
| 1 | B | 2,40 | 6,90 | 0,35 |
| 1 | C | 1,80 | 6,90 | 0,26 |
| 1 | D | 2,70 | 6,90 | 0,39 |
| 1 | E | 2,40 | 6,90 | 0,35 |
+------------+----------+--------+------------+--------+
| 2 | A | 2,40 | 2,30 | 1,04 |
| 2 | B | 2,40 | 2,30 | 1,04 |
| 2 | C | 1,10 | 2,30 | 0,48 |
| 2 | D | 1,20 | 2,30 | 0,52 |
+------------+----------+--------+------------+--------+