R:通过匹配另一列

时间:2017-04-25 08:28:36

标签: r

这可能已被多次询问,但我发现的情况比我的更复杂,而且我真的不知道从哪里开始。我需要在我的数据框中添加一个新列(Condition)并根据列cellNr

中的值填充行

我的数据框是molten.pC

  cellNr     value
1    G63  0.000000
2    G64  8.848623
3    G65  0.000000
4    G66 10.788718
5    B15  5.285402
6    B16  0.000000
7    B17  0.000000
8    C10  0.000000
9    C11  0.000000

我想添加一列Condition并填写如下:

  cellNr     value     Condition
1    G63  0.000000  Growth
2    G64  8.848623  Growth
3    G65  0.000000  Growth
4    G66 10.788718  Growth
5    B15  5.285402  Burst
6    B16  0.000000  Burst
7    B17  0.000000  Burst
8    C10  0.000000  Cellularized
9    C11  0.000000  Cellularized

1 个答案:

答案 0 :(得分:2)

我们可以通过提取第一个字符(base R)在substr中执行此操作,转换为factor并指定labelslevels

molten.pC$Condition <- as.character(factor(substr(molten.pC$cellNr, 1, 1), 
      levels = c("G", "B", "C"), labels = c("Growth", "Burst", "Cellularized")))
molten.pC$Condition
#[1] "Growth"       "Growth"       "Growth"       "Growth"       "Burst" 
#[6]  "Burst"        "Burst"        "Cellularized" "Cellularized"

或者我们可以使用case_when

中的dplyr
library(dplyr) #devel version (soon to be released `0.6.0`)
molten.pC  %>% 
      mutate(Sub = substr(cellNr, 1, 1),
             Condition = case_when(Sub=="G" ~"Growth",
                                   Sub == "B" ~"Burst", 
                           TRUE ~"Cellularized")) %>%
      select(-Sub)
#  cellNr     value    Condition
#1    G63  0.000000       Growth
#2    G64  8.848623       Growth
#3    G65  0.000000       Growth
#4    G66 10.788718       Growth
#5    B15  5.285402        Burst
#6    B16  0.000000        Burst
#7    B17  0.000000        Burst
#8    C10  0.000000 Cellularized
#9    C11  0.000000 Cellularized