我的数据格式如下:
structure(list(cat = structure(c(1L, 2L, 3L, 1L, 2L, 2L, 3L,
3L, 3L, 3L, 1L, 2L), .Label = c("A", "B", "C"), class = "factor"),
ID = structure(c(1L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 2L,
3L, 4L), .Label = c("s1", "s10", "s11", "s12", "s2", "s3",
"s4", "s5", "s6", "s7", "s8", "s9"), class = "factor"), val = c(150,
750, 950, 104, 726, 797, 890, 912, 994, 1004, 199, 704),
LWR = c(100, 700, 900, NA, NA, NA, NA, NA, NA, NA, NA, NA
), UPP = c(200, 800, 1000, NA, NA, NA, NA, NA, NA, NA, NA,
NA)), .Names = c("cat", "ID", "val", "LWR", "UPP"), row.names = c(NA,
-12L), class = "data.frame")
看起来像:
cat ID val LWR UPP
1 A s1 150 100 200
2 B s2 750 700 800
3 C s3 950 900 1000
4 A s4 104 NA NA
5 B s5 726 NA NA
6 B s6 797 NA NA
7 C s7 890 NA NA
8 C s8 912 NA NA
9 C s9 994 NA NA
10 C s10 1004 NA NA
11 A s11 199 NA NA
12 B s12 704 NA NA
我想要做的是在val列中找到一个值,该值具有最接近LWR或UPP值的相同cat。通过查看所需的输出可能最容易理解:
cat id val LWR UPP LS NLWR US NUPP
1 A s1 150 100 200 s4 104 s11 199
2 B s2 750 700 800 s12 704 s6 797
3 C s3 950 900 1000 s8 912 s9 994
新的coloums(LS和NLWR / US和NUPP)与提取的行中的id和val相同,只是给出了新的列名。我试图使用各种形式的“哪个”来运行它,然后改造数据但没有任何运气。有没有直接的方法来做到这一点,还是总是需要多个步骤?
答案 0 :(得分:1)
DF1 <- na.omit(DF)
DF2 <- DF[is.na(DF$LWR),]
library(plyr)
ddply(DF1,.(cat),function(df) {
lwr <- which.min(abs(DF2$val-df$LWR))
upp <- which.min(abs(DF2$val-df$UPP))
df$LS <- DF2[lwr,"ID"]
df$NLWR <- DF2[lwr,"val"]
df$US <- DF2[upp,"ID"]
df$NUPP <- DF2[upp,"val"]
df
})
# cat ID val LWR UPP LS NLWR US NUPP
# 1 A s1 150 100 200 s4 104 s11 199
# 2 B s2 750 700 800 s12 704 s6 797
# 3 C s3 950 900 1000 s7 890 s10 1004
请注意,890比912更接近900,而NUPP则相同。如果值必须介于LWR
和UPP
之间,则应该很容易调整。