R>另一个值中的一列中的值,返回新数据帧中的值

时间:2013-05-26 16:24:31

标签: r dataframe

我的数据格式如下:

structure(list(cat = structure(c(1L, 2L, 3L, 1L, 2L, 2L, 3L, 
3L, 3L, 3L, 1L, 2L), .Label = c("A", "B", "C"), class = "factor"), 
ID = structure(c(1L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 2L, 
3L, 4L), .Label = c("s1", "s10", "s11", "s12", "s2", "s3", 
"s4", "s5", "s6", "s7", "s8", "s9"), class = "factor"), val = c(150, 
750, 950, 104, 726, 797, 890, 912, 994, 1004, 199, 704), 
LWR = c(100, 700, 900, NA, NA, NA, NA, NA, NA, NA, NA, NA
), UPP = c(200, 800, 1000, NA, NA, NA, NA, NA, NA, NA, NA, 
NA)), .Names = c("cat", "ID", "val", "LWR", "UPP"), row.names = c(NA, 
-12L), class = "data.frame")

看起来像:

    cat ID  val LWR  UPP
1    A  s1  150 100  200
2    B  s2  750 700  800
3    C  s3  950 900 1000
4    A  s4  104  NA   NA
5    B  s5  726  NA   NA
6    B  s6  797  NA   NA
7    C  s7  890  NA   NA
8    C  s8  912  NA   NA
9    C  s9  994  NA   NA
10   C s10 1004  NA   NA
11   A s11  199  NA   NA
12   B s12  704  NA   NA

我想要做的是在val列中找到一个值,该值具有最接近LWR或UPP值的相同cat。通过查看所需的输出可能最容易理解:

  cat id val LWR  UPP  LS NLWR  US NUPP
1   A s1 150 100  200  s4  104 s11  199
2   B s2 750 700  800 s12  704  s6  797
3   C s3 950 900 1000  s8  912  s9  994

新的coloums(LS和NLWR / US和NUPP)与提取的行中的id和val相同,只是给出了新的列名。我试图使用各种形式的“哪个”来运行它,然后改造数据但没有任何运气。有没有直接的方法来做到这一点,还是总是需要多个步骤?

1 个答案:

答案 0 :(得分:1)

DF1 <- na.omit(DF)
DF2 <- DF[is.na(DF$LWR),]

library(plyr)

ddply(DF1,.(cat),function(df) {
  lwr <- which.min(abs(DF2$val-df$LWR))
  upp <- which.min(abs(DF2$val-df$UPP))

  df$LS <- DF2[lwr,"ID"]
  df$NLWR <- DF2[lwr,"val"]
  df$US <- DF2[upp,"ID"]
  df$NUPP <- DF2[upp,"val"]

  df
})

#   cat ID val LWR  UPP  LS NLWR  US NUPP
# 1   A s1 150 100  200  s4  104 s11  199
# 2   B s2 750 700  800 s12  704  s6  797
# 3   C s3 950 900 1000  s7  890 s10 1004

请注意,890比912更接近900,而NUPP则相同。如果值必须介于LWRUPP之间,则应该很容易调整。