假设我的原始数据看起来像这样
df <- data.frame(id = 1:10,
V = LETTERS[1:10],
Treatment1 = c(rep(1,3), rep(0,7)),
Treatment2 = c(rep(0,3), rep(1,3), rep(0,4)))
我想将Treatment1
和Treatment2
合并到一个新变量中,该变量取3个值中的1个:Treatment1
,Treatment2
,Control
。那就是我想最终得到这个数据框:
df2 <- data.frame(id = 1:10,
V = LETTERS[1:10],
Treatment = c(rep("Treatment1",3),
rep("Treatment2",3),
rep("Control",4)))
现在我正在使用此代码:
library(dplyr)
df$Treatment <- ifelse(test = df$Treatment1==1, yes = "Treatment1",
no = ifelse(test = df$Treatment2==1,
yes = "Treatment2", no = "Control"))
df2 <- df %>% select(-Treatment1, -Treatment2)
有更好的方法吗?
答案 0 :(得分:3)
最终具有合理可读性和可扩展性的一种方法是创建查找表并将其与现有数据合并,如下所示:
df2 <- data.frame(Treatment1 = c(1,0,0),
Treatment2 = c(0,1,0),
Treatment = c("Control", "Treatment1", "Treatment2"));
merge(df, df2, all.x=TRUE) #Setting all.x ensures rows of df aren't dropped if there isn't a match
# Treatment1 Treatment2 id V Treatment
# 1 0 0 7 G Treatment2
# 2 0 0 8 H Treatment2
# 3 0 0 9 I Treatment2
# 4 0 0 10 J Treatment2
# 5 0 1 4 D Treatment1
# 6 0 1 5 E Treatment1
# 7 0 1 6 F Treatment1
# 8 1 0 1 A Control
# 9 1 0 2 B Control
# 10 1 0 3 C Control
答案 1 :(得分:2)
我们可以在没有任何ifelse
df$Treatment <- with(df, c("Control", "Treatment1", "Treatment2")[(Treatment1 +
2*Treatment2)+1])
df$Treatment
#[1] "Treatment1" "Treatment1" "Treatment1" "Treatment2" "Treatment2"
#[6] "Treatment2" "Control" "Control" "Control" "Control"
或另一个选项是pmax
c("Control", "Treatment1", "Treatment2")[do.call(pmax, df[3:4]*col(df[3:4]))+1]
#[1] "Treatment1" "Treatment1" "Treatment1" "Treatment2" "Treatment2"
#[6] "Treatment2" "Control" "Control" "Control" "Control"
如果需要与'df2'进行比较,paste
将第3和第4列与'df'进行比较,请在'df2'中设置'Treatment'的unique
元素的名称来自'v1'的独特元素(在示例中它以相同的顺序)使用它来替换值。
v1 <- do.call(paste0, df[3:4])
unname(setNames(as.character(unique(df2$Treatment)), c("10", "01", "00"))[v1])
#[1] "Treatment1" "Treatment1" "Treatment1" "Treatment2" "Treatment2"
#[6] "Treatment2" "Control" "Control" "Control" "Control"
注意:所有这些方法都没有使用包,应该是高效的
答案 2 :(得分:2)
dplyr::case_when
是嵌套ifelse
的唯一替代方法:
library(dplyr)
df %>% mutate(Treatment = case_when(.$Treatment1 == 1 ~ 'Treatment1',
.$Treatment2 == 1 ~ 'Treatment2',
TRUE ~ 'Control')) %>%
select(-Treatment1, -Treatment2)
## id V Treatment
## 1 1 A Treatment1
## 2 2 B Treatment1
## 3 3 C Treatment1
## 4 4 D Treatment2
## 5 5 E Treatment2
## 6 6 F Treatment2
## 7 7 G Control
## 8 8 H Control
## 9 9 I Control
## 10 10 J Control
由于它仍然是新的且有些实验性,case_when
需要在$
for now中使用mutate
符号,但在it looks like that will change之前需要更长时间。