R:汇总和限制B列中唯一出现的次数,而group_by A列中

时间:2020-06-06 15:22:52

标签: r

我有以下数据dfs_alltasks:

    by_hour task
1   0       Apple Receiving
2   0       Apple Receiving
3   0       Orange Receiving
4   0       Banana Receiving
5   0       Banana Receiving
6   0       Orange Receiving
7   1       Orange Receiving
8   1       Banana Receiving
9   1       Banana Receiving
10  1       Banana Receiving
11  1       Banana Receiving
12  1       Banana Receiving
13  1       Orange Receiving
14  2       Banana Receiving
15  3       Banana Receiving

我喜欢在“ by_hour”列中进行分组,同时总结并返回编号。小组中发生的任务,我应该得到这样的东西:

    by_hour task              count
1   0       Apple Receiving   2
2   0       Orange Receiving  2
3   0       Banana Receiving  2
4   1       Orange Receiving  2
5   1       Banana Receiving  5
6   2       Banana Receiving  1
7   3       Banana Receiving  1

我尝试过: dfs_alltasks%>%group_by(by_hour)%>%summarise_all(no_rows = length(task))

但是我收到“ list2(...)中的错误:未找到对象'任务'”的错误

4 个答案:

答案 0 :(得分:3)

您不需要分组依据

library(tidyverse)

df_example <-
  structure(list(
    by_hour = c(0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1,
                1, 2, 3),
    task = c(
      "Apple Remaining",
      "Apple Remaining",
      "Orange Remaining",
      "Banana Remaining",
      "Banana Remaining",
      "Orange Remaining",
      "Orange Remaining",
      "Banana Remaining",
      "Banana Remaining",
      "Banana Remaining",
      "Banana Remaining",
      "Banana Remaining",
      "Orange Remaining",
      "Banana Remaining",
      "Banana Remaining"
    )
  ),
  class = "data.frame",
  row.names = c(NA, -15L))

df_example %>% 
  count(by_hour,task)
#>   by_hour             task n
#> 1       0  Apple Remaining 2
#> 2       0 Banana Remaining 2
#> 3       0 Orange Remaining 2
#> 4       1 Banana Remaining 5
#> 5       1 Orange Remaining 2
#> 6       2 Banana Remaining 1
#> 7       3 Banana Remaining 1

reprex package(v0.3.0)于2020-06-06创建

答案 1 :(得分:1)

尝试一下:

library(tibble)
library(dplyr)
data <- tibble::tribble(
   ~by_hour, ~task,
  0 ,      "Apple Receiving",  
  0 ,      "Apple Receiving", 
  0 ,      "Orange Receiving",
  0 ,      "Banana Receiving",
  0 ,      "Banana Receiving",
  0 ,      "Orange Receiving",
  1 ,      "Orange Receiving",
  1 ,      "Banana Receiving",
  1 ,      "Banana Receiving",
  1 ,      "Banana Receiving",
  1 ,      "Banana Receiving",
  1 ,      "Banana Receiving",
  1 ,      "Orange Receiving",
  2 ,      "Banana Receiving",
  3 ,      "Banana Receiving")
data %>% group_by(by_hour,task) %>% summarize(count=n()) %>% ungroup()

答案 2 :(得分:1)

请考虑使用dput()

提供数据样本
df <- structure(list(by_hour = c(0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 
1, 2, 3), task = c("Apple Remaining", "Apple Remaining", "Orange Remaining", 
"Banana Remaining", "Banana Remaining", "Orange Remaining", "Orange Remaining", 
"Banana Remaining", "Banana Remaining", "Banana Remaining", "Banana Remaining", 
"Banana Remaining", "Orange Remaining", "Banana Remaining", "Banana Remaining"
)), class = "data.frame", row.names = c(NA, -15L))

您可以使用dplyr包和group_by作为变量。

library(dplyr)
df %>% 
  group_by(by_hour, task) %>% 
  count %>% 
  ungroup

结果

  by_hour task       n
    <dbl> <chr>  <int>
1       0 Apple      2
2       0 Banana     2
3       0 Orange     2
4       1 Banana     5
5       1 Orange     2
6       2 Banana     1
7       3 Banana     1

答案 3 :(得分:1)

我们也可以使用

library(data.table)
setDT(df)[, .(n = .N), .(by_hour, task)]
相关问题