如何使用面板数据中的库(dplyr)基于当前行与上一行之间的比率计算新列?

时间:2016-07-03 16:42:20

标签: r

我想创建一个按名称分组的新列,如下所示:

library(dplyr)

dates <- as.Date(as.character(c("2011-01-13",
                           "2011-01-14",
                           "2011-01-15",
                           "2011-01-16",
                           "2011-01-17",
                           "2011-01-13",
                           "2011-01-14",
                           "2011-01-15",
                           "2011-01-16",
                           "2011-01-17",
                           "2011-01-13",
                           "2011-01-14",
                           "2011-01-15",
                           "2011-01-16",
                           "2011-01-17")))

Name <-c("Andy","Andy","Andy","Andy","Andy","Jo","Jo","Jo","Jo","Jo","Me","Me","Me","Me",'Me')
contribution<- c(2,3,2,2,3,4,5,10,4,10,1,2,4,1,5)
# put together
data <- data.frame(Name, dates, contribution)

  Name      dates contribution
#1  Andy 2011-01-13            2
#2  Andy 2011-01-14            3
#3  Andy 2011-01-15            2
#4  Andy 2011-01-16            2
#5  Andy 2011-01-17            3
#6    Jo 2011-01-13            4
#7    Jo 2011-01-14            5
#8    Jo 2011-01-15           10
#9    Jo 2011-01-16            4
#10   Jo 2011-01-17           10
#11   Me 2011-01-13            1
#12   Me 2011-01-14            2
#13   Me 2011-01-15            4
#14   Me 2011-01-16            1
#15   Me 2011-01-17            5

(贡献 - 滞后(贡献))/ 1 +滞后(贡献):

其中lag()只是当前行 - 1

(3-2)/ 1 + 2 = 1/3 = 0.333

我尝试过:

data %>%  group_by(Name) %>% mutate(change = (contribution-lag(contribution)/1+lag(contribution)))

  Name      dates contribution change
#1    Andy 2011-01-13            2     NA
#2    Andy 2011-01-14            3      3
#3    Andy 2011-01-15            2      2
#4    Andy 2011-01-16            2      2
#5    Andy 2011-01-17            3      3
#6      Jo 2011-01-13            4     NA
#7      Jo 2011-01-14            5      5
#8      Jo 2011-01-15           10     10
#9      Jo 2011-01-16            4      4
#10     Jo 2011-01-17           10     10
#11     Me 2011-01-13            1     NA
#12     Me 2011-01-14            2      2
#13     Me 2011-01-15            4      4
#14     Me 2011-01-16            1      1
#15     Me 2011-01-17            5      5

1 个答案:

答案 0 :(得分:1)

我们需要在1+lag(contribution)

周围使用括号
data %>%
    group_by(Name) %>% 
    mutate(change = (contribution - lag(contribution))/(1+ lag(contribution)))
#    Name      dates contribution        new
#   <fctr>     <date>        <dbl>      <dbl>
#1    Andy 2011-01-13            2         NA
#2    Andy 2011-01-14            3  0.3333333
#3    Andy 2011-01-15            2 -0.2500000
#4    Andy 2011-01-16            2  0.0000000
#5    Andy 2011-01-17            3  0.3333333
#6      Jo 2011-01-13            4         NA
#7      Jo 2011-01-14            5  0.2000000
#8      Jo 2011-01-15           10  0.8333333
#9      Jo 2011-01-16            4 -0.5454545
#10     Jo 2011-01-17           10  1.2000000
#11     Me 2011-01-13            1         NA
#12     Me 2011-01-14            2  0.5000000
#13     Me 2011-01-15            4  0.6666667
#14     Me 2011-01-16            1 -0.6000000
#15     Me 2011-01-17            5  2.0000000