按两列分组并汇总多列

时间:2020-09-10 20:20:38

标签: r dataframe dplyr tidyverse

我有一个数据框,我想按“状态”和“日期”列进行分组,然后总结其他类似列的值。

df

State  Female  Male   Date
------------------------------
Texas  2       2     01/01/04
Texas  3        1     01/01/04
Texas  5        4     02/01/04
Cali   1        1     05/06/05
Cali   2        1     05/06/05
Cali   3         1    10/06/05
Cali   1         2     10/06/05
NY    10         5    11/06/05
NY    11         6    12/06/05

预期结果

df

State  Female  Male   Date
------------------------------
Texas  5       3     01/01/04
Texas  5        4     02/01/04
Cali   3        2     05/06/05
Cali   4         3    10/06/05
NY    10         5    11/06/05
NY    11         6    12/06/05

我尝试了分组依据并进行了总结,但是我不知道我如何对2列进行相同的操作

我的尝试

df <- df_homicides %>% 
        group_by(state) %>% 
        summarise(Female = sum(Female))

``
Thanks for your help!

2 个答案:

答案 0 :(得分:3)

我们可以将summarise版本across的{​​{1}}与dplyr一起使用

> = 1.00

或使用library(dplyr) df %>% group_by(State, Date) %>% summarise(across(everything(), sum, na.rm = TRUE), .groups = 'drop') # A tibble: 6 x 4 # State Date Female Male # <chr> <chr> <int> <int> #1 Cali 05/06/2005 3 2 #2 Cali 10/06/2005 4 3 #3 NY 11/06/2005 10 5 #4 NY 12/06/2005 11 6 #5 Texas 01/01/2004 5 3 #6 Texas 02/01/2004 5 4 中的aggregate

base R

数据

aggregate(.~ State + Date, df, sum, na.rm = TRUE)

答案 1 :(得分:1)

尝试一下。您可以使用summarise_all()来聚合具有所需功能的多个变量。这里的代码:

library(dplyr)
#Code
df %>% group_by(State,Date) %>%
  summarise_all(.funs = sum,na.rm=T)

输出:

# A tibble: 6 x 4
# Groups:   State [3]
  State Date       Female  Male
  <chr> <chr>       <int> <int>
1 Cali  05/06/2005      3     2
2 Cali  10/06/2005      4     3
3 NY    11/06/2005     10     5
4 NY    12/06/2005     11     6
5 Texas 01/01/2004      5     3
6 Texas 02/01/2004      5     4

使用了一些数据:

#Data
df <- structure(list(State = c("Texas", "Texas", "Texas", "Cali", "Cali", 
"Cali", "Cali", "NY", "NY"), Female = c(2L, 3L, 5L, 1L, 2L, 3L, 
1L, 10L, 11L), Male = c(2L, 1L, 4L, 1L, 1L, 1L, 2L, 5L, 6L), 
    Date = c("01/01/2004", "01/01/2004", "02/01/2004", "05/06/2005", 
    "05/06/2005", "10/06/2005", "10/06/2005", "11/06/2005", "12/06/2005"
    )), class = "data.frame", row.names = c(NA, -9L))