根据group_by行及其列值选择行

时间:2019-03-08 00:16:47

标签: r dataframe group-by dplyr grouping

我制作了可复制的数据集。

在此数据集中,我尝试获取按“值”和“类别”分组的列,并仅在其中“值”的值大于4的情况下,才能在“类别”中获得所有值的最大值group_by

提出问题的另一种方法是,只有在每个“类别”的“值”大于4的情况下,才能为每个标签获取最大的“值”

das <- data.frame(val=1:24,
              weigh=c(10,10,10,11,11,11,20,20,20,21,21,21,30,30,30,31,31,31,40,40,40,41,41,41),
              value=c(4.1,3.2,4.3,1.1,2.2,5.3,2.1,2.2,3.3,3.1,8.2,1.3,3.6,2.1,3.1,3.1,3.1,1.1,7.2,4.5,5.1,3.2,2.5,9.1),
              label=c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4),
              category=c("A","B","C","A","B","C","A","B","C","A","B","C","A","B","C","A","B","C","A","B","C","A","B","C"))

val weigh value label category
1   1   10  4.1 1   A
2   2   10  3.2 1   B
3   3   10  4.3 1   C
4   4   11  1.1 1   A
5   5   11  2.2 1   B
6   6   11  5.3 1   C
7   7   20  2.1 2   A
8   8   20  2.2 2   B
9   9   20  3.3 2   C
10  10  21  3.1 2   A
11  11  21  8.2 2   B
12  12  21  1.3 2   C
13  13  30  3.6 3   A
14  14  30  2.1 3   B
15  15  30  3.1 3   C
16  16  31  3.1 3   A
17  17  31  3.1 3   B
18  18  31  1.1 3   C
19  19  40  7.2 4   A
20  20  40  4.5 4   B
21  21  40  5.1 4   C
22  22  41  3.2 4   A
23  23  41  2.5 4   B
24  24  41  9.1 4   C

这是预期的输出

 val weigh value label category
 1  1   10  4.1 1   A
 5  6   11  5.3 1   C
 2  2   10  3.2 1   B
 10 10  21  3.1 2   A
 3  11  21  8.2 2   B
 9  9   20  3.3 2   C
 2  19  40  7.2 4   A
 4  20  40  4.5 4   B
 6  24  41  9.1 4   C

我尝试了以下操作,但未获得预期的输出。在这里,我只得到值> 4,而不是带有该标签的该类别中最大的数字

das1 <- das[das$value >4,]

result <- das1 %>% 
  group_by(category,label) %>% 
  slice(which.max(value))


 val weigh value label category
 1  1   10  4.1 1   A
 5  6   11  5.3 1   C
 3  11  21  8.2 2   B
 2  19  40  7.2 4   A
 4  20  40  4.5 4   B
 6  24  41  9.1 4   C

2 个答案:

答案 0 :(得分:3)

我们可以首先group_by labelfilter个具有any value > 4的组,然后仅选择max {{1} } valuelabel中的}。

category

答案 1 :(得分:3)

我认为您的措辞描述令人困惑,因为您一直在说不同的话。这符合您的预期输出,解释是

仅当该“ 标签”中的“值”大于4时,才为每个标签的每个“类别”获得最大的“值”(在OP中是指类别)

library(tidyverse)
das <- data.frame(
  val = 1:24,
  weigh = c(10, 10, 10, 11, 11, 11, 20, 20, 20, 21, 21, 21, 30, 30, 30, 31, 31, 31, 40, 40, 40, 41, 41, 41),
  value = c(4.1, 3.2, 4.3, 1.1, 2.2, 5.3, 2.1, 2.2, 3.3, 3.1, 8.2, 1.3, 3.6, 2.1, 3.1, 3.1, 3.1, 1.1, 7.2, 4.5, 5.1, 3.2, 2.5, 9.1),
  label = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4),
  category = c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C")
)

das %>%
  group_by(label) %>%
  filter(any(value > 4)) %>%
  group_by(label, category) %>%
  filter(value == max(value)) %>%
  arrange(label, category)
#> # A tibble: 9 x 5
#> # Groups:   label, category [9]
#>     val weigh value label category
#>   <int> <dbl> <dbl> <dbl> <fct>   
#> 1     1    10   4.1     1 A       
#> 2     2    10   3.2     1 B       
#> 3     6    11   5.3     1 C       
#> 4    10    21   3.1     2 A       
#> 5    11    21   8.2     2 B       
#> 6     9    20   3.3     2 C       
#> 7    19    40   7.2     4 A       
#> 8    20    40   4.5     4 B       
#> 9    24    41   9.1     4 C

reprex package(v0.2.1)于2019-03-07创建