Question

给定一个数据集，我们可以使用%PDF-1.5 %�� 1 0 obj <</Type/Catalog/Pages 2 0 R/Lang(en-IN) /StructTreeRoot 8 0 R/MarkInfo<</Marked true>>>> endobj 2 0 obj <</Type/Pages/Count 1/Kids[ 4 0 R] >> endobj 3 0 obj <</Author(admin) /CreationDate(D:20190724114817+05'30') /ModDate(D:20190724114817+05'30') /Producer(��来限制我们在top_n内返回的行数（即排序/排名）。我喜欢大多数tidyverse操作的灵活性，因为它们在大多数情况下可以撤消，即您可以返回到开始的地方。

使用数据和这里所提问题的可能解决方案（我写过），如何才能最好地撤消tidyverse？。

数据：

top_n

代码：

df<-structure(list(milk = c(1L, 2L, 1L, 0L, 4L), bread = c(4L, 5L, 
2L, 1L, 10L), juice = c(3L, 4L, 6L, 5L, 2L), honey = c(1L, 2L, 
0L, 3L, 1L), eggs = c(4L, 4L, 7L, 3L, 5L), beef = c(2L, 3L, 0L, 
1L, 8L)), class = "data.frame", row.names = c(NA, -5L))

上面给了我这个：

df %>% 
  gather(key,value) %>% 
  group_by(key) %>% 
  summarise(Sum=sum(value)) %>% 
  arrange(desc(Sum)) %>% 
  top_n(3,Sum) %>% 
  ungroup()

现在我要（学习如何）做的是不删除代码即返回原始数据集，即以编程方式从# A tibble: 3 x 2 key Sum <chr> <int> 1 eggs 23 2 bread 22 3 juice 20中恢复：

自然地，我想到了top_n（spreading是上面的结果）：

res

但是，如何解决该问题或撤消spread(res,key,Sum) # A tibble: 1 x 3 bread eggs juice <int> <int> <int> 1 22 23 20的替代解决方案，还没想到。我怎样才能最好地做到这一点？

Answer 1

使用pull的类似想法，但方法略有不同：

library(tidyverse)

df %>%
  summarise_all(sum) %>%  # Your method of selecting 
  gather(key, val) %>%    # top three columns 
  top_n(3) %>%            # 
  arrange(-val) %>%       #
  pull(key) %>%           # pull 'key'
  select(df, .)           # select cols from df by `.`

#  eggs bread juice
#1    4     4     3
#2    4     5     4
#3    7     2     6
#4    3     1     5
#5    5    10     2

然后，根据上一个问题提出构想：

df[, '['(names(sort(colSums(df), T)), 1:3)]

给出相同的结果。

Answer 2

这是一个非常密集的基础R解决方案：

df[, rank(-colSums(df))[1:3]]
  eggs bread juice
1    4     4     3
2    4     5     4
3    7     2     6
4    3     1     5
5    5    10     2

Answer 3

不一定是相反的过程，但是，一种可能是根据列名进行选择：

df %>% 
  gather(Key, Value) %>% 
  group_by(Key) %>%
  summarise(Sum = sum(Value)) %>% 
  arrange(desc(Sum)) %>%
  top_n(3, Sum) %>%
  ungroup() %>%
  pull(Key) %>% 
  {select(df, one_of(.))}

  eggs bread juice
1    4     4     3
2    4     5     4
3    7     2     6
4    3     1     5
5    5    10     2

或者将值和行号放入列表，然后嵌套并传播的可能性：

df %>% 
 gather(Key, Value) %>% 
 group_by(Key) %>%
 summarise(Sum = sum(Value),
           Values = list(Value),
           Row_ID = list(row_number())) %>% 
 arrange(desc(Sum)) %>% 
 top_n(3, Sum) %>%
 select(-Sum) %>%
 ungroup() %>%
 unnest() %>%
 spread(Key, Values) %>%
 select(-Row_ID)

  bread  eggs juice
  <int> <int> <int>
1     4     4     3
2     5     4     4
3     2     7     6
4     1     3     5
5    10     5     2

返回top_n之后的原始数据集

3 个答案: