Question

我有一个这样的数据框：

我想基于Shape_Area和number_of_clusters提取行。

对于ID_12：9，number_of_clusters为1。因此，需要对具有最大ID_12==9的{{1}}行进行切片。对于ID_12：73，有4个聚类，但有5个观测值。因此，基于Shape_Area的4个最大观测值将出现在结果数据框中。

我已经尝试过了，但是似乎不能正常工作：

Shape_Area

Answer 1

df %>% group_by(ID_12) %>% 
  # sort by area in decreasing order for each ID_12
  arrange(desc(Shape_Area), .by_group=TRUE) %>% 
  # create a new column called rank that stores the row number
  # since we sorted in decreasing order by area, for each group, the row with largest area will have rank=1, and so on
  mutate(rank = row_number()) %>%
  # only take rows that have rank <= number_of_clusters
  filter(rank <= number_of_clusters) %>%
  # only select columns in the original dataframe
  select(names(df))

根据两列保留不同数量的重复行

1 个答案: