我正在尝试根据某些条件创建一个新变量。我有以下数据:
df <- data.frame(ID = c("A1","A1","A2","A2","A3","A4","A4"),
type = c("small","large","small","large","large","small","large"),
code = c("B9", "[0,20]","B9","[20,40]","[0,20]","B9","[40,60]" ))
给出:
ID type code
1 A1 small B9
2 A1 large [0,20]
3 A2 small B9
4 A2 large [20,40]
5 A3 large [0,20]
6 A4 small B9
7 A4 large [40,60]
我想创建一个基于 type == large 和 code 的相应值的新变量 (code2),同时按 ID< /em>。所以 ID - A1 应该有 [0,20] 作为它的 code2。我想实现以下目标:
ID type code code2
1 A1 small B9 [0,20]
2 A1 large [0,20] [0,20]
3 A2 small B9 [20,40]
4 A2 large [20,40] [20,40]
5 A3 large [0,20] [0,20]
6 A4 small B9 [40,60]
7 A4 large [40,60] [40,60]
据我所知,我正在尝试使用 dplyr
和 ifelse
,但没有成功。
答案 0 :(得分:4)
我们可以在dplyr
中使用group by操作,即按'ID'分组,提取'type'值为“large”的'code'(假设里面没有'type'的重复值每个“ID”
library(dplyr)
df <- df %>%
group_by(ID) %>%
mutate(code2 = code[type == 'large']) %>%
ungroup
-输出
df
# A tibble: 7 x 4
ID type code code2
<chr> <chr> <chr> <chr>
1 A1 small B9 [0,20]
2 A1 large [0,20] [0,20]
3 A2 small B9 [20,40]
4 A2 large [20,40] [20,40]
5 A3 large [0,20] [0,20]
6 A4 small B9 [40,60]
7 A4 large [40,60] [40,60]
如果有重复,使用match
,它会给出第一个匹配索引的索引
df <- df %>%
group_by(ID) %>%
mutate(code2 = code[match('large', type)]) %>%
ungroup
答案 1 :(得分:1)
data.table
选项
> setDT(df)[, code2 := code[type == "large"], ID][]
ID type code code2
1: A1 small B9 [0,20]
2: A1 large [0,20] [0,20]
3: A2 small B9 [20,40]
4: A2 large [20,40] [20,40]
5: A3 large [0,20] [0,20]
6: A4 small B9 [40,60]
7: A4 large [40,60] [40,60]