为R中的不同因素创建月增长率列

时间:2015-03-11 22:29:10

标签: r

假设我有以下数据

set.seed(123)

Company <-  c(rep("Company 1",5),rep("Company 2",10))

Dates <- c(seq(as.Date("2014-09-01"), as.Date("2015-01-01"), by="months"),
  seq(as.Date("2011-09-01"), as.Date("2012-06-01"), by="months"))

X.1 <- sample(c(0,0,5,10,15,20,25,30,40,50),size=15,replace=TRUE)
X.2 <- sample(c(0,0,5,10,15,20,25,30,40,50),size=15,replace=TRUE)

df <- data.frame(Dates,Company,X.1,X.2)



       Dates   Company X.1 X.2
1  2014-09-01 Company 1  50   0
2  2014-10-01 Company 1  50   5
3  2014-11-01 Company 1  25  15
4  2014-12-01 Company 1  30   5
5  2015-01-01 Company 1   0  40
6  2011-09-01 Company 2  15   0
7  2011-10-01 Company 2  30  15
8  2011-11-01 Company 2   5  30
9  2011-12-01 Company 2  10   0
10 2012-01-01 Company 2   5  20
11 2012-02-01 Company 2   0   5
12 2012-03-01 Company 2  15   0
13 2012-04-01 Company 2  15  30
14 2012-05-01 Company 2  10  40
15 2012-06-01 Company 2   0  10

我要做的是找到变量X.1X.2的每月增长率 在每个公司内,并将这些列绑定到数据框的右侧。这里的问题是每个公司的日期范围不同,这就是我遇到这个问题的原因。此外,由于我在数据Inf中有0,而NA是可以的。

感谢您的帮助。

1 个答案:

答案 0 :(得分:1)

#I computed the growth using log: growth=log(X(t)/X(t-1)). If you want to compute using (X(t)-X(t-1))/X(t-1), you can just use that. Also, for the first period, it will be NA

#Assumption: the data are equally spaced for each company. You get Inf, if your last period value is 0 and -Inf if your current period value is 0 (because we used log). The growth will be 0 (when current period is zero) if you don't use log (see second method)

library(dplyr) 
    df%>%
      group_by(Company)%>%
      mutate(gx_1=log(X.1/lag(X.1,1)),gx_2=log(X.1/lag(X.2,1))
     )
    Source: local data frame [15 x 6]
    Groups: Company

            Dates   Company X.1 X.2       gx_1       gx_2
    1  2014-09-01 Company 1   5  40         NA         NA
    2  2014-10-01 Company 1  30   5  1.7917595 -0.2876821
    3  2014-11-01 Company 1  15   0 -0.6931472  1.0986123
    4  2014-12-01 Company 1  40  10  0.9808293        Inf
    5  2015-01-01 Company 1  50  50  0.2231436  1.6094379
    6  2011-09-01 Company 2   0  40         NA         NA
    7  2011-10-01 Company 2  20  25        Inf -0.6931472
    8  2011-11-01 Company 2  40  25  0.6931472  0.4700036
    9  2011-12-01 Company 2  20  50 -0.6931472 -0.2231436
    10 2012-01-01 Company 2  15  25 -0.2876821 -1.2039728
    11 2012-02-01 Company 2  50  30  1.2039728  0.6931472
    12 2012-03-01 Company 2  15  20 -1.2039728 -0.6931472
    13 2012-04-01 Company 2  25  20  0.5108256  0.2231436
    14 2012-05-01 Company 2  20   5 -0.2231436  0.0000000
    15 2012-06-01 Company 2   0   0       -Inf       -Inf

#without using log , i.e.
df%>%
     group_by(Company)%>%
     mutate(gx_1=((X.1-lag(X.1,1))/lag(X.1,1)),gx_2=((X.2-lag(X.2,1))/lag(X.2,1)))