虽然有一些类似的问题,但我找不到一个适用于我的案例的答案。
我只想在mydf
数据框中替换缺失值(NA),其中每个COMPOUND
(或每COMPOUND
和SUBJECT
)观察到的最小值
以下是我最好的尝试MWE(我有一个很长的数据集,所以像data.table
这样的效率最好):
set.seed(1)
mydf <- data.frame(SUBJECT=rep(paste('subject',1:10,sep='_'), each=15),
TREATMENT=rep(c('A','B'), each=75),
TIME=rep(rep(paste('t',0:2,sep=''), each=5), 10),
COMPOUND=rep(paste('COMP',1:5,sep='-'), 30),
VALUE=rnorm(n=150, mean=20, sd=5))
#INTRODUCE RANDOM NAs
mydf$VALUE[which(mydf$VALUE %in% sample(mydf$VALUE, 15))] <- NA
#CHECK WHICH ARE THE MIN VALUES PER COMPOUND
library(data.table)
DT <- data.table(mydf)
#DT[, list(min.val=min(VALUE, na.rm=TRUE)), by=list(SUBJECT=SUBJECT, COMPOUND=COMPOUND)]
DT[, list(min.val=min(VALUE, na.rm=TRUE)), by=COMPOUND]
#REPLACE MISSING VALUES WITH MIN VALUES PER COMPOUND
#mydf <- as.data.frame(DT[, VALUE2 := min(VALUE[!is.na(VALUE)]), by=list(SUBJECT=SUBJECT, COMPOUND=COMPOUND)])
mydf <- as.data.frame(DT[, VALUE2 := min(VALUE[!is.na(VALUE)]), by=COMPOUND])
mydf
正如您所看到的,在我的尝试中,所有值都被min替换,但我只想更换NAs。
我想要这个:
> head(mydf, 30)
SUBJECT TREATMENT TIME COMPOUND VALUE VALUE2
1 subject_1 A t0 COMP-1 16.867731 16.867731
2 subject_1 A t0 COMP-2 20.918217 20.918217
3 subject_1 A t0 COMP-3 15.821857 15.821857
4 subject_1 A t0 COMP-4 27.976404 27.976404
5 subject_1 A t0 COMP-5 21.647539 21.647539
6 subject_1 A t1 COMP-1 15.897658 15.897658
7 subject_1 A t1 COMP-2 22.437145 22.437145
8 subject_1 A t1 COMP-3 23.691624 23.691624
9 subject_1 A t1 COMP-4 22.878907 22.878907
10 subject_1 A t1 COMP-5 NA 11.796972
11 subject_1 A t2 COMP-1 27.558906 27.558906
12 subject_1 A t2 COMP-2 21.949216 21.949216
13 subject_1 A t2 COMP-3 16.893797 16.893797
14 subject_1 A t2 COMP-4 8.926501 8.926501
15 subject_1 A t2 COMP-5 NA 11.796972
16 subject_2 A t0 COMP-1 19.775332 19.775332
17 subject_2 A t0 COMP-2 19.919049 19.919049
18 subject_2 A t0 COMP-3 24.719181 24.719181
19 subject_2 A t0 COMP-4 24.106106 24.106106
20 subject_2 A t0 COMP-5 NA 11.796972
21 subject_2 A t1 COMP-1 24.594887 24.594887
22 subject_2 A t1 COMP-2 23.910682 23.910682
23 subject_2 A t1 COMP-3 20.372825 20.372825
24 subject_2 A t1 COMP-4 10.053242 10.053242
25 subject_2 A t1 COMP-5 23.099129 23.099129
26 subject_2 A t2 COMP-1 NA 10.428203
27 subject_2 A t2 COMP-2 NA 10.975207
28 subject_2 A t2 COMP-3 12.646238 12.646238
29 subject_2 A t2 COMP-4 17.609250 17.609250
30 subject_2 A t2 COMP-5 22.089708 22.089708