ID值的和

时间:2018-10-01 14:29:01

标签: r dplyr sum tidyr

我有这样的每日数据的数据框

date    firms value  ID
6/4/2007    A   16    1
6/5/2007    A   18    1
6/20/2007   A   22    2
6/29/2007   A   25    2
6/21/2007   A   12    3
6/4/2007    B   14    1
6/5/2007    B   19    2
6/20/2007   B   17    2
6/29/2007   B   12    1
6/21/2007   B   10    3

现在,我想获取该值每一天的ID明智的总和。由于是面板数据,因此总和值将在重复的日期重复。期望值输出如下:

date    firms   value   ID        ID 1 Sum    ID 2 Sum      ID 3 Sum 
6/4/2007    A    16     1             30           0              0
6/5/2007    A    18     1             18          19              0
6/20/2007   A    22     2              0          39              0
6/29/2007   A    25     2             12          25              0
6/21/2007   A    12     3              0           0              22
6/4/2007    B    14     1             30           0              0
6/5/2007    B    19     2             18          19              0
6/20/2007   B    17     2              0          39              0
6/29/2007   B    12     1             12          25              0
6/21/2007   B    10     3              0          0               22

在这方面请帮助我。我找不到来自互联网的代码。

2 个答案:

答案 0 :(得分:1)

您可以将数据从长格式转换为宽格式,然后使用summarise_ifmutate_if获得所需的输出

要修改结果列的名称,请参见此answer

library(dplyr)
library(tidyr)

df <- read.table(text = txt, header = TRUE, stringsAsFactors = FALSE)

df_wide <- df %>% 
  mutate(date = as.Date(date, '%m/%d/%Y')) %>% 
  mutate(rowid = row_number()) %>% 
  spread(ID, value) %>% 
  select(-rowid)

df_wide %>% 
  group_by(date) %>% 
  summarise_if(is.numeric, funs(sum(., na.rm = TRUE)))
#> # A tibble: 5 x 4
#>   date         `1`   `2`   `3`
#>   <date>     <int> <int> <int>
#> 1 2007-06-04    30     0     0
#> 2 2007-06-05    18    19     0
#> 3 2007-06-20     0    39     0
#> 4 2007-06-21     0     0    22
#> 5 2007-06-29    12    25     0

df_wide %>% 
  group_by(date) %>% 
  mutate_if(is.numeric, funs(sum(., na.rm = TRUE))) %>% 
  arrange(firms)
#> # A tibble: 10 x 5
#> # Groups:   date [5]
#>    date       firms   `1`   `2`   `3`
#>    <date>     <chr> <int> <int> <int>
#>  1 2007-06-04 A        30     0     0
#>  2 2007-06-05 A        18    19     0
#>  3 2007-06-20 A         0    39     0
#>  4 2007-06-21 A         0     0    22
#>  5 2007-06-29 A        12    25     0
#>  6 2007-06-04 B        30     0     0
#>  7 2007-06-05 B        18    19     0
#>  8 2007-06-20 B         0    39     0
#>  9 2007-06-21 B         0     0    22
#> 10 2007-06-29 B        12    25     0

reprex package(v0.2.1.9000)于2018-10-01创建

答案 1 :(得分:0)

我们还可以使用dcast中的data.table

library(data.table)
setDT(df)[, ID_Sum := sum(value), by = .(ID, date)]
dcast(df, date + firms + value ~ paste0("Sum_", ID), value.var = 'ID_Sum', fill = 0)