根据两列保留不同数量的重复行

时间:2019-11-07 17:20:48

标签: r sorting dplyr

我有一个这样的数据框:enter image description here

我想基于Shape_Areanumber_of_clusters提取行。

对于ID_12:9,number_of_clusters为1。因此,需要对具有最大ID_12==9的{​​{1}}行进行切片。对于ID_12:73,有4个聚类,但有5个观测值。因此,基于Shape_Area的4个最大观测值将出现在结果数据框中。

我已经尝试过了,但是似乎不能正常工作:

Shape_Area

1 个答案:

答案 0 :(得分:0)

df %>% group_by(ID_12) %>% 
  # sort by area in decreasing order for each ID_12
  arrange(desc(Shape_Area), .by_group=TRUE) %>% 
  # create a new column called rank that stores the row number
  # since we sorted in decreasing order by area, for each group, the row with largest area will have rank=1, and so on
  mutate(rank = row_number()) %>%
  # only take rows that have rank <= number_of_clusters
  filter(rank <= number_of_clusters) %>%
  # only select columns in the original dataframe
  select(names(df))