我有一个像这样的数据集:
test <-
data.frame(
variable = c("A","A","B","B","C","D","E","E","E","F","F","G"),
confidence = c(1,0.6,0.1,0.15,1,0.3,0.4,0.5,0.2,1,0.4,0.9),
freq = c(2,2,2,2,1,1,3,3,3,2,2,1),
weight = c(2,2,0,0,1,3,5,5,5,0,0,4)
)
> test
variable confidence freq weight
1 A 1.00 2 2
2 A 0.60 2 2
3 B 0.10 2 0
4 B 0.15 2 0
5 C 1.00 1 1
6 D 0.30 1 3
7 E 0.40 3 5
8 E 0.50 3 5
9 E 0.20 3 5
10 F 1.00 2 0
11 F 0.40 2 0
12 G 0.90 1 4
我想通过每个变量的置信度来计算权重之和,如下所示: ,其中i是变量(A,B,C ......)
开发上述公式:
w[1]c[1]+w[1]c[2]=2*1+2*0.6=3.2
w[2]c[1]+w[2]c[2]
w[3]c[3]+w[3]c[4]
w[4]c[3]+w[4]c[4]
w[5]c[5]
w[6]c[6]
w[7]c[7]+w[7]c[8]+w[7]c[9]
w[8]c[7]+w[8]c[8]+w[8]c[9]
w[9]c[7]+w[9]c[8]+w[9]c[9]
…
结果应如下所示:
> test
variable confidence freq weight SWC
1 A 1.00 2 2 3.2
2 A 0.60 2 2 3.2
3 B 0.10 2 0 0.0
4 B 0.15 2 0 0.0
5 C 1.00 1 1 1.0
6 D 0.30 1 3 0.9
7 E 0.40 3 5 5.5
8 E 0.50 3 5 5.5
9 E 0.20 3 5 5.5
10 F 1.00 2 0 0.0
11 F 0.40 2 0 0.0
12 G 0.90 1 4 3.6
请注意,每个观察值的置信度值不同,但每个变量具有相同的权重,因此对于每个相同的变量观察值,我需要的总和是相同的。
首先,我尝试使用以下方法多次迭代每个变量:
> table(test$variable)
A B C D E F G
2 2 1 1 3 2 1
但我无法使其发挥作用。那么,我计算了每个变量开始的位置,试图使for循环仅在这些值中迭代:
> tpos = cumsum(table(test$variable))
> tpos = tpos+1
> tpos
A B C D E F G
3 5 6 7 10 12 13
> tpos = shift(tpos, 1)
> tpos
[1] NA 3 5 6 7 10 12
> tpos[1]=1
> tpos
[1] 1 3 5 6 7 10 12
# tpos is a vector with the positions where each variable (A, B, c...) start
> tposn = c(1:nrow(test))[-tpos]
> tposn
[1] 2 4 8 9 11
> c(1:nrow(test))[-tposn]
[1] 1 3 5 6 7 10 12
# then i came up with this loop but it doesn't give the correct result
for(i in 1:nrow(test)[-tposn]){
a = test$freq[i]-1
test$SWC[i:i+a] = sum(test$weight[i]*test$confidence[i:i+a])
}
也许有一种更简单的方法吗? tapply?
答案 0 :(得分:3)
使用dplyr
:
library(dplyr)
test %>%
group_by(variable) %>%
mutate(SWC=sum(confidence*weight))
# A tibble: 12 x 5
# Groups: variable [7]
variable confidence freq weight SWC
<fctr> <dbl> <dbl> <dbl> <dbl>
1 A 1.00 2 2 3.2
2 A 0.60 2 2 3.2
3 B 0.10 2 0 0.0
4 B 0.15 2 0 0.0
5 C 1.00 1 1 1.0
6 D 0.30 1 3 0.9
7 E 0.40 3 5 5.5
8 E 0.50 3 5 5.5
9 E 0.20 3 5 5.5
10 F 1.00 2 0 0.0
11 F 0.40 2 0 0.0
12 G 0.90 1 4 3.6