将NA值替换为每组最小观察值

时间:2018-02-07 04:56:55

标签: r dataframe replace data.table

虽然有一些类似的问题,但我找不到一个适用于我的案例的答案。

我只想在mydf数据框中替换缺失值(NA),其中每个COMPOUND(或每COMPOUNDSUBJECT)观察到的最小值

以下是我最好的尝试MWE(我有一个很长的数据集,所以像data.table这样的效率最好):

set.seed(1)
mydf <- data.frame(SUBJECT=rep(paste('subject',1:10,sep='_'), each=15),
                   TREATMENT=rep(c('A','B'), each=75),
                   TIME=rep(rep(paste('t',0:2,sep=''), each=5), 10),
                   COMPOUND=rep(paste('COMP',1:5,sep='-'), 30),
                   VALUE=rnorm(n=150, mean=20, sd=5))

#INTRODUCE RANDOM NAs
mydf$VALUE[which(mydf$VALUE %in% sample(mydf$VALUE, 15))] <- NA

#CHECK WHICH ARE THE MIN VALUES PER COMPOUND
library(data.table)
DT <- data.table(mydf)
#DT[, list(min.val=min(VALUE, na.rm=TRUE)), by=list(SUBJECT=SUBJECT, COMPOUND=COMPOUND)]
DT[, list(min.val=min(VALUE, na.rm=TRUE)), by=COMPOUND]

#REPLACE MISSING VALUES WITH MIN VALUES PER COMPOUND
#mydf <- as.data.frame(DT[, VALUE2 := min(VALUE[!is.na(VALUE)]), by=list(SUBJECT=SUBJECT, COMPOUND=COMPOUND)])
mydf <- as.data.frame(DT[, VALUE2 := min(VALUE[!is.na(VALUE)]), by=COMPOUND])
mydf

正如您所看到的,在我的尝试中,所有值都被min替换,但我只想更换NAs。

我想要这个:

> head(mydf, 30)
     SUBJECT TREATMENT TIME COMPOUND     VALUE    VALUE2
1  subject_1         A   t0   COMP-1 16.867731 16.867731
2  subject_1         A   t0   COMP-2 20.918217 20.918217
3  subject_1         A   t0   COMP-3 15.821857 15.821857
4  subject_1         A   t0   COMP-4 27.976404 27.976404
5  subject_1         A   t0   COMP-5 21.647539 21.647539
6  subject_1         A   t1   COMP-1 15.897658 15.897658
7  subject_1         A   t1   COMP-2 22.437145 22.437145
8  subject_1         A   t1   COMP-3 23.691624 23.691624
9  subject_1         A   t1   COMP-4 22.878907 22.878907
10 subject_1         A   t1   COMP-5        NA 11.796972
11 subject_1         A   t2   COMP-1 27.558906 27.558906
12 subject_1         A   t2   COMP-2 21.949216 21.949216
13 subject_1         A   t2   COMP-3 16.893797 16.893797
14 subject_1         A   t2   COMP-4  8.926501  8.926501
15 subject_1         A   t2   COMP-5        NA 11.796972
16 subject_2         A   t0   COMP-1 19.775332 19.775332
17 subject_2         A   t0   COMP-2 19.919049 19.919049
18 subject_2         A   t0   COMP-3 24.719181 24.719181
19 subject_2         A   t0   COMP-4 24.106106 24.106106
20 subject_2         A   t0   COMP-5        NA 11.796972
21 subject_2         A   t1   COMP-1 24.594887 24.594887
22 subject_2         A   t1   COMP-2 23.910682 23.910682
23 subject_2         A   t1   COMP-3 20.372825 20.372825
24 subject_2         A   t1   COMP-4 10.053242 10.053242
25 subject_2         A   t1   COMP-5 23.099129 23.099129
26 subject_2         A   t2   COMP-1        NA 10.428203
27 subject_2         A   t2   COMP-2        NA 10.975207
28 subject_2         A   t2   COMP-3 12.646238 12.646238
29 subject_2         A   t2   COMP-4 17.609250 17.609250
30 subject_2         A   t2   COMP-5 22.089708 22.089708

0 个答案:

没有答案