假设我有以下数据
set.seed(123)
Company <- c(rep("Company 1",5),rep("Company 2",10))
Dates <- c(seq(as.Date("2014-09-01"), as.Date("2015-01-01"), by="months"),
seq(as.Date("2011-09-01"), as.Date("2012-06-01"), by="months"))
X.1 <- sample(c(0,0,5,10,15,20,25,30,40,50),size=15,replace=TRUE)
X.2 <- sample(c(0,0,5,10,15,20,25,30,40,50),size=15,replace=TRUE)
df <- data.frame(Dates,Company,X.1,X.2)
Dates Company X.1 X.2
1 2014-09-01 Company 1 50 0
2 2014-10-01 Company 1 50 5
3 2014-11-01 Company 1 25 15
4 2014-12-01 Company 1 30 5
5 2015-01-01 Company 1 0 40
6 2011-09-01 Company 2 15 0
7 2011-10-01 Company 2 30 15
8 2011-11-01 Company 2 5 30
9 2011-12-01 Company 2 10 0
10 2012-01-01 Company 2 5 20
11 2012-02-01 Company 2 0 5
12 2012-03-01 Company 2 15 0
13 2012-04-01 Company 2 15 30
14 2012-05-01 Company 2 10 40
15 2012-06-01 Company 2 0 10
我要做的是找到变量X.1
和X.2
的每月增长率
在每个公司内,并将这些列绑定到数据框的右侧。这里的问题是每个公司的日期范围不同,这就是我遇到这个问题的原因。此外,由于我在数据Inf
中有0,而NA
是可以的。
感谢您的帮助。
答案 0 :(得分:1)
#I computed the growth using log: growth=log(X(t)/X(t-1)). If you want to
compute using (X(t)-X(t-1))/X(t-1), you can just use that. Also, for the first
period, it will be NA
。
#Assumption: the data are equally spaced for each company. You get Inf, if
your last period value is 0 and -Inf if your current period value is 0 (because we used log). The growth will be 0 (when current period is zero) if you don't use log (see second method)
library(dplyr)
df%>%
group_by(Company)%>%
mutate(gx_1=log(X.1/lag(X.1,1)),gx_2=log(X.1/lag(X.2,1))
)
Source: local data frame [15 x 6]
Groups: Company
Dates Company X.1 X.2 gx_1 gx_2
1 2014-09-01 Company 1 5 40 NA NA
2 2014-10-01 Company 1 30 5 1.7917595 -0.2876821
3 2014-11-01 Company 1 15 0 -0.6931472 1.0986123
4 2014-12-01 Company 1 40 10 0.9808293 Inf
5 2015-01-01 Company 1 50 50 0.2231436 1.6094379
6 2011-09-01 Company 2 0 40 NA NA
7 2011-10-01 Company 2 20 25 Inf -0.6931472
8 2011-11-01 Company 2 40 25 0.6931472 0.4700036
9 2011-12-01 Company 2 20 50 -0.6931472 -0.2231436
10 2012-01-01 Company 2 15 25 -0.2876821 -1.2039728
11 2012-02-01 Company 2 50 30 1.2039728 0.6931472
12 2012-03-01 Company 2 15 20 -1.2039728 -0.6931472
13 2012-04-01 Company 2 25 20 0.5108256 0.2231436
14 2012-05-01 Company 2 20 5 -0.2231436 0.0000000
15 2012-06-01 Company 2 0 0 -Inf -Inf
#without using log , i.e.
df%>%
group_by(Company)%>%
mutate(gx_1=((X.1-lag(X.1,1))/lag(X.1,1)),gx_2=((X.2-lag(X.2,1))/lag(X.2,1)))