Question

我想在我的数据框的每组中选择不同数量的行。我还没有想出用dplyr做这件事的优雅方法。要为每个组选择相同数量的行，我会这样做：

library(dplyr)

iris %>% 
    group_by(Species) %>%
    arrange(Sepal.Length) %>%
    top_n(2)

但是我希望能够引用另一个表格，其中包含每个组的行数，如下所示：

top_rows_desired <- data.frame(Species = unique(iris$Species),
    n_desired = c(4,2,5))

Answer 1

我们可以使用＆＃39; iris＆＃39;做left_join。和＆＃39; top_rows_desired＆＃39;通过＆＃39; Species＆＃39;，按＆＃39;种类＆＃39;，slice sequence first＆＃39; n_desired＆＃39;并删除＆＃39; n_desired＆＃39;列select。

left_join(iris, top_rows_desired, by = "Species") %>%
                     group_by(Species) %>% 
                     arrange(desc(Sepal.Length)) %>%
                     slice(seq(first(n_desired))) %>%
                     select(-n_desired)

Answer 2

只为那些无法运行代码akrun provided的人添加此答案。我挣扎了一段时间。这个答案解决了问题#2531 mentioned on github。

您可能无法运行c = [] for k, v in b.items(): if not k.startswith('S') or not v: continue new_key = v try: n = int(k[1:]) new_value = b['P%d' % (n,)] except KeyError, ValueError: continue c.append((new_key, new_value))，因为您的环境中已加载slice。 xgboost屏蔽了导致此问题的dplyr xgboost函数。

slice

所以使用

Attaching package: ‘xgboost’

The following object is masked from ‘package:dplyr’:

slice

Warning message:
package ‘xgboost’ was built under R version 3.4.1

可能适合你。

因此我浪费了一个小时。希望这有用。

当每组不同n时，过滤n行分组数据帧

2 个答案: