R中按月累计的金额

时间:2020-09-09 18:57:35

标签: r data-science data-cleaning

我想从中转换数据

Month   Expenditures
1       1
1       2
2       3
2       6
3       2
3       5

对此:

Month   Cumulative_expenditures
1       3
2       12
3       19

,但似乎无法弄清楚该怎么做。

我尝试使用cumsum()函数,但它会计数每个观察值-不能区分组。

任何帮助将不胜感激!

4 个答案:

答案 0 :(得分:2)

两个步骤base R的解决方案是:

#Code
df1 <- aggregate(Expenditures~Month,data=mydf,sum)
#Create cum sum
df1$Expenditures <- cumsum(df1$Expenditures)

输出:

  Month Expenditures
1     1            3
2     2           12
3     3           19

使用了一些数据:

#Data
mydf <- structure(list(Month = c(1L, 1L, 2L, 2L, 3L, 3L), Expenditures = c(1L, 
2L, 3L, 6L, 2L, 5L)), class = "data.frame", row.names = c(NA, 
-6L))

答案 1 :(得分:1)

使用dplyr

library(dplyr)

df %>% 
  group_by(Month) %>% 
  summarise(Expenditures = sum(Expenditures), .groups = "drop") %>% 
  mutate(Expenditures = cumsum(Expenditures))

#> # A tibble: 3 x 2
#>   Month Expenditures
#>   <int>        <int>
#> 1     1            3
#> 2     2           12
#> 3     3           19

或在基数R中:

data.frame(Month = unique(df$Month), 
           Expenditure = cumsum(tapply(df$Expenditure, df$Month, sum)))
#>   Month Expenditure
#> 1     1           3
#> 2     2          12
#> 3     3          19

答案 2 :(得分:1)

我们可以使用base R

out <- with(df1, rowsum(Expenditures, Month))
data.frame(Month = row.names(out), Expenditure = cumsum(out))
#  Month Expenditure
#1     1           3
#2     2          12
#3     3          19

或更紧凑

with(df1, stack(cumsum(rowsum(Expenditures, Month)[,1])))[2:1]

数据

df1 <- structure(list(Month = c(1L, 1L, 2L, 2L, 3L, 3L), Expenditures = c(1L, 
2L, 3L, 6L, 2L, 5L)), class = "data.frame", row.names = c(NA, 
-6L))

答案 3 :(得分:1)

这是使用subset + ave

的另一个基本R选项
subset(
  transform(df, Expenditures = cumsum(Expenditures)),
  ave(rep(FALSE, nrow(df)), Month, FUN = function(x) seq_along(x) == length(x))
)

给出

  Month Expenditures
2     1            3
4     2           12
6     3           19