我正在尝试编写一个代码,用于比较同一数据帧中的两列,并使用摘要创建一个新列,该列将说明ID是否在审阅发生之前注册。
这是我的数据框:
tt <- structure(list(ID = c("P40", "P40", "P40", "P42", "P42", "P43", "P43",
"P44", "P44"),Type = c("Pre-Initial", "Review", "Review", "Initial", "Review", "Initial", "Review", "Pre-Initial", "Review"),
Registered = c("Yes", "", "", "No", "", "Yes", "", "No", "")),
class = "data.frame", row.names = c(NA, -9L))
我想要实现的结果:
ID Outcome
P40 Yes
P42 No
P43 Yes
P44 No
这是我尝试过的代码,但仅对所有ID显示否
tt %>% group_by(ID) %>%
summarise(outcome = c("No", "Yes")[all(Registered == "Yes" & Type == "Review") + 1])
答案 0 :(得分:2)
可以尝试:
tt %>%
group_by(ID) %>%
summarise(
Outcome = c("No", "Yes")[any(Type == "Review" & cumsum(Registered == "Yes") == 1) + 1]
)
输出:
# A tibble: 4 x 2
ID Outcome
<chr> <chr>
1 P40 Yes
2 P42 No
3 P43 Yes
4 P44 No
请注意,这假设Yes
的{{1}}每隔Registered
只发生一次。否则,只需将ID
替换为cumsum(Registered == "Yes") == 1
。
答案 1 :(得分:2)
另一个dplyr
变体,如果"No"
中没有Registered
的值,则返回"Yes"
,或者将其发生索引与{{1} },并据此分配值。
"Review"
答案 2 :(得分:0)
我不确定您的预期结果是什么,但是从您的描述看来,Type == 'Review'
行根本无关紧要:您需要删除它们,然后删除该列(并重命名Registered
列):
tt %>%
filter(Type != 'Review') %>%
select(- Type, Outcome = Registered)