我有一个类似下面的数据框:
ID STATUS
1638483 Very bad
1407499 Very good
1383920 Good
1407499 Bad
第一列包含ID
,有些是唯一的,有些则不是
第二列包含STATUS
,可以是:"Very good"
,"Good"
,"Bad"
或"Very Bad"
。
我想:
ID
的行(此处STATUS
并不重要):包含ID
1638483
或1383920
的行例如,ID
行的最佳状态的行:例如ID
1407499
行所需的输出是:
ID STATUS
1638483 Very bad
1407499 Very good
1383920 Good
我尝试使用dplyr
包。
我成功地按ID
对数据进行分组,但后来我被卡住了。
答案 0 :(得分:2)
使用dplyr的一种可能的解决方案:
# create tibble
df <- tibble(
id = c("1638483", "1407499", "1383920", "1407499"),
status = c("Very bad", "Very good", "Good", "Bad")
)
# solution
df %>%
mutate_at("status", factor,
levels = c("Very bad", "Bad", "Good", "Very good")) %>%
arrange(desc(status)) %>%
group_by(id) %>%
filter(status == status[1]) %>%
ungroup()
结果:
# A tibble: 3 x 2
id status
<chr> <fctr>
1 1383920 Good
2 1407499 Very good
3 1638483 Very bad
答案 1 :(得分:1)
根据需要STATUS
将factor
转换为levels
并使用ave
df$STATUS = factor(df$STATUS, levels = c("Very bad", "Bad", "Good", "Very good"))
df[ave(as.numeric(df$STATUS), df$ID, FUN = function(x) x == max(x)) == 1,]
# ID STATUS
#1 1638483 Very bad
#2 1407499 Very good
#3 1383920 Good
数据强>
df = structure(list(ID = c(1638483L, 1407499L, 1383920L, 1407499L),
STATUS = c("Very bad", "Very good", "Good", "Bad")), .Names = c("ID",
"STATUS"), class = "data.frame", row.names = c(NA, -4L))