我有这样的每日数据的数据框
date firms value ID
6/4/2007 A 16 1
6/5/2007 A 18 1
6/20/2007 A 22 2
6/29/2007 A 25 2
6/21/2007 A 12 3
6/4/2007 B 14 1
6/5/2007 B 19 2
6/20/2007 B 17 2
6/29/2007 B 12 1
6/21/2007 B 10 3
现在,我想获取该值每一天的ID明智的总和。由于是面板数据,因此总和值将在重复的日期重复。期望值输出如下:
date firms value ID ID 1 Sum ID 2 Sum ID 3 Sum
6/4/2007 A 16 1 30 0 0
6/5/2007 A 18 1 18 19 0
6/20/2007 A 22 2 0 39 0
6/29/2007 A 25 2 12 25 0
6/21/2007 A 12 3 0 0 22
6/4/2007 B 14 1 30 0 0
6/5/2007 B 19 2 18 19 0
6/20/2007 B 17 2 0 39 0
6/29/2007 B 12 1 12 25 0
6/21/2007 B 10 3 0 0 22
在这方面请帮助我。我找不到来自互联网的代码。
答案 0 :(得分:1)
您可以将数据从长格式转换为宽格式,然后使用summarise_if
或mutate_if
获得所需的输出
要修改结果列的名称,请参见此answer
library(dplyr)
library(tidyr)
df <- read.table(text = txt, header = TRUE, stringsAsFactors = FALSE)
df_wide <- df %>%
mutate(date = as.Date(date, '%m/%d/%Y')) %>%
mutate(rowid = row_number()) %>%
spread(ID, value) %>%
select(-rowid)
df_wide %>%
group_by(date) %>%
summarise_if(is.numeric, funs(sum(., na.rm = TRUE)))
#> # A tibble: 5 x 4
#> date `1` `2` `3`
#> <date> <int> <int> <int>
#> 1 2007-06-04 30 0 0
#> 2 2007-06-05 18 19 0
#> 3 2007-06-20 0 39 0
#> 4 2007-06-21 0 0 22
#> 5 2007-06-29 12 25 0
df_wide %>%
group_by(date) %>%
mutate_if(is.numeric, funs(sum(., na.rm = TRUE))) %>%
arrange(firms)
#> # A tibble: 10 x 5
#> # Groups: date [5]
#> date firms `1` `2` `3`
#> <date> <chr> <int> <int> <int>
#> 1 2007-06-04 A 30 0 0
#> 2 2007-06-05 A 18 19 0
#> 3 2007-06-20 A 0 39 0
#> 4 2007-06-21 A 0 0 22
#> 5 2007-06-29 A 12 25 0
#> 6 2007-06-04 B 30 0 0
#> 7 2007-06-05 B 18 19 0
#> 8 2007-06-20 B 0 39 0
#> 9 2007-06-21 B 0 0 22
#> 10 2007-06-29 B 12 25 0
由reprex package(v0.2.1.9000)于2018-10-01创建
答案 1 :(得分:0)
我们还可以使用dcast
中的data.table
library(data.table)
setDT(df)[, ID_Sum := sum(value), by = .(ID, date)]
dcast(df, date + firms + value ~ paste0("Sum_", ID), value.var = 'ID_Sum', fill = 0)