将2个变量合并为1?

时间:2016-08-19 14:49:33

标签: r merge

假设我的原始数据看起来像这样

df <- data.frame(id = 1:10,
                 V = LETTERS[1:10],
                 Treatment1 = c(rep(1,3), rep(0,7)),
                 Treatment2 = c(rep(0,3), rep(1,3), rep(0,4)))

我想将Treatment1Treatment2合并到一个新变量中,该变量取3个值中的1个:Treatment1Treatment2Control。那就是我想最终得到这个数据框:

df2 <- data.frame(id = 1:10,
                  V = LETTERS[1:10],
                  Treatment = c(rep("Treatment1",3), 
                                rep("Treatment2",3),
                                rep("Control",4)))

现在我正在使用此代码:

library(dplyr)
df$Treatment <- ifelse(test = df$Treatment1==1, yes = "Treatment1", 
                       no = ifelse(test = df$Treatment2==1, 
                                   yes = "Treatment2", no = "Control"))

df2 <- df %>% select(-Treatment1, -Treatment2)

有更好的方法吗?

3 个答案:

答案 0 :(得分:3)

最终具有合理可读性和可扩展性的一种方法是创建查找表并将其与现有数据合并,如下所示:

df2 <- data.frame(Treatment1 = c(1,0,0),
                  Treatment2 = c(0,1,0),
                  Treatment = c("Control", "Treatment1", "Treatment2"));
merge(df, df2, all.x=TRUE)  #Setting all.x ensures rows of df aren't dropped if there isn't a match

 #      Treatment1 Treatment2 id V  Treatment
 #   1           0          0  7 G Treatment2
 #   2           0          0  8 H Treatment2
 #   3           0          0  9 I Treatment2
 #   4           0          0 10 J Treatment2
 #   5           0          1  4 D Treatment1
 #   6           0          1  5 E Treatment1
 #   7           0          1  6 F Treatment1
 #   8           1          0  1 A    Control
 #   9           1          0  2 B    Control
 #   10          1          0  3 C    Control

答案 1 :(得分:2)

我们可以在没有任何ifelse

的情况下执行此操作
df$Treatment <- with(df, c("Control", "Treatment1", "Treatment2")[(Treatment1 +
                                2*Treatment2)+1])
df$Treatment
#[1] "Treatment1" "Treatment1" "Treatment1" "Treatment2" "Treatment2" 
#[6] "Treatment2" "Control"    "Control"    "Control"    "Control"   

或另一个选项是pmax

c("Control", "Treatment1", "Treatment2")[do.call(pmax, df[3:4]*col(df[3:4]))+1]
#[1] "Treatment1" "Treatment1" "Treatment1" "Treatment2" "Treatment2" 
#[6] "Treatment2" "Control"    "Control"    "Control"    "Control"  

如果需要与'df2'进行比较,paste将第3和第4列与'df'进行比较,请在'df2'中设置'Treatment'的unique元素的名称来自'v1'的独特元素(在示例中它以相同的顺序)使用它来替换值。

v1 <- do.call(paste0, df[3:4])
unname(setNames(as.character(unique(df2$Treatment)), c("10", "01", "00"))[v1])
#[1] "Treatment1" "Treatment1" "Treatment1" "Treatment2" "Treatment2" 
#[6] "Treatment2" "Control"    "Control"    "Control"    "Control"   

注意:所有这些方法都没有使用包,应该是高效的

答案 2 :(得分:2)

dplyr::case_when是嵌套ifelse的唯一替代方法:

library(dplyr)

df %>% mutate(Treatment = case_when(.$Treatment1 == 1 ~ 'Treatment1', 
                                    .$Treatment2 == 1 ~ 'Treatment2', 
                                    TRUE ~ 'Control')) %>% 
    select(-Treatment1, -Treatment2)
    ##    id V  Treatment
    ## 1   1 A Treatment1
    ## 2   2 B Treatment1
    ## 3   3 C Treatment1
    ## 4   4 D Treatment2
    ## 5   5 E Treatment2
    ## 6   6 F Treatment2
    ## 7   7 G    Control
    ## 8   8 H    Control
    ## 9   9 I    Control
    ## 10 10 J    Control

由于它仍然是新的且有些实验性,case_when需要在$ for now中使用mutate符号,但在it looks like that will change之前需要更长时间。