I'm an unexperienced user of R and I need to create quite a complicated stuff. My dataset looks like this :
a,b,c,d,e are different individuals. I want to complete the D column as follows : At the last line for each individual in the col A, D = sum(C)/(B-1).
Expected results should look like :
D4=sum(C2:C4)/(B4-1)=0.5
D6=sum(C5:C6)/(B6-1)=1, etc.
I attempted to deal with it with something like :
for(i in 2:NROW(dataset)){
dataset[i,4]<-ifelse(
(dataset[i,1]==data1[i-1,1]),sum(dataset[i,3])/(dataset[i,2]-1),NA
)
}
But it is obviously not sufficient, as it computes the D value for all the rows and not only the last for each individual, and it does not calculate the sum of C values for this individual.
And I really don't know how to figure it out. Do you guys have any advice ? Many thanks.
答案 0 :(得分:0)
如果我理解你的问题,那么这是达到预期结果的一种方法:
df <- data.frame(
A=c("a","a","a","b","b","c","c","c","d","e","e"),
B=c(3,3,3,2,2,3,3,3,1,2,2),
C=c(NA,1,0,NA,1,NA,0,1,NA,NA,0),
stringsAsFactors = FALSE)
for(i in 2:NROW(df)){
df[i,4]<-ifelse(
(df[i,1]!=df[i+1,1] | i == nrow(df)),sum(df[df$A == df[i,1],]$C, na.rm=TRUE)/(df[i,2]-1),NA
)
}
此代码生成下表:
A B C V4
1 a 3 NA NA
2 a 3 1 NA
3 a 3 0 0.5
4 b 2 NA NA
5 b 2 1 1.0
6 c 3 NA NA
7 c 3 0 NA
8 c 3 1 0.5
9 d 1 NA NaN
10 e 2 NA NA
11 e 2 0 0.0
ifelse首先测试A列当前行的个体是否与下一行中的个体不同,或者它是否是最后一行。
如果它是这个个体的最后一行,它将获取行的列C(忽略NA)与A列中存在的个体除以B列中的值减1。
否则它会在第四列中放置NA
。
答案 1 :(得分:0)
使用dplyr
您可以尝试为所有行生成D
,然后删除不需要的地方:
dftest %>%
group_by(A,B) %>%
dplyr::mutate(D = sum(C, na.rm=TRUE)/(B-1)) %>%
dplyr::mutate(D = if_else(row_number()== n(), D, as.double(NA)))
给出:
Source: local data frame [11 x 4]
Groups: A, B [5]
A B C D
<chr> <dbl> <dbl> <dbl>
1 a 3 NA NA
2 a 3 1 NA
3 a 3 0 0.5
4 b 2 NA NA
5 b 2 1 1.0
6 c 3 NA NA
7 c 3 0 NA
8 c 3 1 0.5
9 d 1 NA NaN
10 e 2 NA NA
11 e 2 0 0.0