Question

我正在研究R，其数据集如下所示：

test=data.frame("1991" = c(1,5,3), "1992" = c(4,3,3), "1993" = c(10,5,3), "1994" = c(1,1,1), "1995" = c(2,2,6))
test=plyr::rename(test, c("X1991"="1991", "X1992"="1992", "X1993"="1993", "X1994"="1994", "X1995"="1995"))

我想要做的是我想要创建名为Pre1991，Pre1992，Pre1993，......的变量，这些变量将存储累积值，直到那一年，例如

Pre1991 = test$1991
Pre1992 = test$1991 + test$1992
Pre1993 = test$1991 + test$1992 + test$1993

等等。

我的真实数据集包含1900-2017年的变量，因此我无法手动执行此操作。我想写一个for循环，但它没有用。

for (i in 1900:2017){
  x = paste0("Pre",i)
  df[[x]] = rowSums(df[,(colnames(df)<=i)]) 
}

有人可以帮助我查看我的代码/建议其他方法吗？谢谢！

编辑1：

非常感谢！而且我想知道是否有一种方法可以反向使用cumsum功能？例如，如果我对特定年份之后发生的事情感兴趣：

Post1991 = test$1992 + test$1993 + test$1994 + test$1995 + ...
Post1992 = test$1993 + test$1994 + test$1995 + ...
Post1993 = test$1994 + test$1995 + ...

Answer 1

这有点低效，因为它从data.frame转换为matrix并返回，但是......

as.data.frame(t(apply(as.matrix(test), 1, cumsum)))
#   1991 1992 1993 1994 1995
# 1    1    5   15   16   18
# 2    5    8   13   14   16
# 3    3    6    9   10   16

如果您的数据包含非基于年份的其他列，例如

test$quux <- LETTERS[3:5]
test
#   1991 1992 1993 1994 1995 quux
# 1    1    4   10    1    2    C
# 2    5    3    5    1    2    D
# 3    3    3    3    1    6    E

然后是双方的子集：

test[1:5] <- as.data.frame(t(apply(as.matrix(test[1:5]), 1, cumsum)))
test
#   1991 1992 1993 1994 1995 quux
# 1    1    5   15   16   18    C
# 2    5    8   13   14   16    D
# 3    3    6    9   10   16    E

修改

相反，只需使用重复的rev：

as.data.frame(t(apply(as.matrix(test), 1, function(a) rev(cumsum(rev(a)))-a))) # 1991 1992 1993 1994 1995 # 1 17 13 3 2 0 # 2 11 8 3 2 0 # 3 13 10 7 6 0

Answer 2

使用tidyverse我们可以在再次传播之前收集和计算。为此，需要安排数据。

library(tidyverse)
test <- data.frame("1991" = c(1, 5, 3),
                   "1992" = c(4, 3, 3),
                   "1993" = c(10, 5, 3),
                   "1994" = c(1, 1, 1),
                   "1995" = c(2, 2, 6))
test <- plyr::rename(test, c("X1991" = "1991",
                             "X1992" = "1992",
                             "X1993" = "1993",
                             "X1994" = "1994",
                             "X1995" = "1995"))

转发

test %>%
  mutate(id = 1:nrow(.)) %>% # adding an ID to identify groups
  gather(year, value, -id) %>% # wide to long format
  arrange(id, year) %>%
  group_by(id) %>%
  mutate(value = cumsum(value)) %>% 
  ungroup() %>%
  spread(year, value) %>%  # long to wide format
  select(-id) %>%
  setNames(paste0("pre", names(.))) # add prefix to columns

##  A tibble: 3 x 5
#   pre1991 pre1992 pre1993 pre1994 pre1995
#     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
# 1      1.      5.     15.     16.     18.
# 2      5.      8.     13.     14.     16.
# 3      3.      6.      9.     10.     16.

反向

由于您的定义并非严格指定相反的顺序，因此其自身的反向顺序将是累积滞后总和。

test %>%
  mutate(id = 1:nrow(.)) %>%
  gather(year, value, -id) %>%
  arrange(id, desc(year)) %>% # using desc() to reverse sorting
  group_by(id) %>%
  mutate(value = cumsum(lag(value, default = 0))) %>% # lag cumsum
  ungroup() %>%
  spread(year, value) %>%
  select(-id) %>%
  setNames(paste0("post", names(.)))


## A tibble: 3 x 5
#   post1991 post1992 post1993 post1994 post1995
#      <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
# 1      17.      13.       3.       2.       0.
# 2      11.       8.       3.       2.       0.
# 3      13.      10.       7.       6.       0.

Answer 3

我们可以使用rowCumsums

中的matrixStats

library(matrixStats)
test[] <- rowCumsums(as.matrix(test))
test
#  1991 1992 1993 1994 1995
#1    1    5   15   16   18
#2    5    8   13   14   16
#3    3    6    9   10   16

r

3 个答案:

转发

反向