R - 选择行组中的特定行

时间:2017-10-26 15:35:48

标签: r dataframe group-by row

我有一个类似下面的数据框:

     ID          STATUS
1638483        Very bad
1407499       Very good
1383920            Good
1407499             Bad

第一列包含ID,有些是唯一的,有些则不是 第二列包含STATUS,可以是:"Very good""Good""Bad""Very Bad"

我想:

  • 保留具有唯一ID的行(此处STATUS并不重要):包含ID 16384831383920的行例如,
  • 仅选择具有非唯一ID 行的最佳状态的行:例如ID 1407499

所需的输出是:

     ID          STATUS
1638483        Very bad
1407499       Very good
1383920            Good

我尝试使用dplyr包。 我成功地按ID对数据进行分组,但后来我被卡住了。

2 个答案:

答案 0 :(得分:2)

使用dplyr的一种可能的解决方案:

# create tibble
df <- tibble(
  id = c("1638483", "1407499", "1383920", "1407499"),
  status = c("Very bad", "Very good", "Good", "Bad")
)

# solution
df %>%
  mutate_at("status", factor, 
            levels = c("Very bad", "Bad", "Good", "Very good")) %>%
  arrange(desc(status)) %>%
  group_by(id) %>%
  filter(status == status[1]) %>%
  ungroup()

结果:

# A tibble: 3 x 2
       id    status
    <chr>    <fctr>
1 1383920      Good
2 1407499 Very good
3 1638483  Very bad

答案 1 :(得分:1)

根据需要STATUSfactor转换为levels并使用ave

df$STATUS = factor(df$STATUS, levels = c("Very bad", "Bad", "Good", "Very good"))
df[ave(as.numeric(df$STATUS), df$ID, FUN = function(x) x == max(x)) == 1,]
#       ID    STATUS
#1 1638483  Very bad
#2 1407499 Very good
#3 1383920      Good

数据

df = structure(list(ID = c(1638483L, 1407499L, 1383920L, 1407499L), 
    STATUS = c("Very bad", "Very good", "Good", "Bad")), .Names = c("ID", 
"STATUS"), class = "data.frame", row.names = c(NA, -4L))