R中的ddply变换(百分比变化)

时间:2016-08-08 16:08:00

标签: r dplyr plyr

我有data.frame,如下所示:

Brand       Year       EUR
Brand1      2015       10
Brand1      2016       20
Brand2      2015       100
Brand2      2016       500
Brand3      2015       25
Brand4      2015       455
...

另外,我附上以下代码:

library(plyr)
library(dplyr)
library(scales)

set.seed(1992)
n=68

Year <- sample(c("2015", "2016"), n, replace = TRUE, prob = NULL)
Brand <- sample("Brand", n, replace = TRUE, prob = NULL)
Brand <- paste0(Brand, sample(1:5, n, replace = TRUE, prob = NULL))
EUR <- abs(rnorm(n))*100000

df <- data.frame(Year, Brand, EUR)

我需要一些额外的数据转换(添加更多列)以供我未来的研究使用。

首先,我计算标签的位置(对于我未来的图表)并称之为pos

df.summary = df %>% group_by(Brand, Year) %>% 
  summarise(EUR = sum(EUR)) %>%   #
  mutate( pos = cumsum(EUR)-0.5*EUR)

我想要做的是,根据percentage grow为每个Brand计算Year。所以我添加这一行:

df.summary = ddply(df.summary, .(Brand), transform, 
               pChange = (sum(df.summary[df.summary$Year == "2016",]$EUR)/
                         sum(df.summary[df.summary$Year == "2015",]$EUR) )-1  
                     )

然而,我得到的是不变大小 - 我所有数据框的增长。

你能帮我计算每个品牌的百分比变化吗?

谢谢!

1 个答案:

答案 0 :(得分:4)

此外,如果您使用lag

会更容易
df.summary %>% group_by(Brand) %>% 
      mutate(pChange = (EUR - lag(EUR))/lag(EUR) * 100)

# Source: local data frame [10 x 5]
#Groups: Brand [5]
#
#    Brand   Year      EUR      pos   pChange
#   <fctr> <fctr>    <dbl>    <dbl>     <dbl>
#1  Brand1   2015 637896.7 318948.3        NA
#2  Brand1   2016 721944.2 998868.8  13.17573
#3  Brand2   2015 708697.6 354348.8        NA
#4  Brand2   2016 300541.1 858968.2 -57.59248
#5  Brand3   2015 454890.1 227445.1        NA
#6  Brand3   2016 576095.6 742937.9  26.64500
#7  Brand4   2015 305712.0 152856.0        NA
#8  Brand4   2016 174073.3 392748.6 -43.05970
#9  Brand5   2015 589970.7 294985.3        NA
#10 Brand5   2016 518510.2 849225.8 -12.11254

根据@ r2evans的建议,如果事先没有安排Year

df.summary %>% group_by(Brand) %>% arrange(Year) %>%
          mutate(pChange = (EUR - lag(EUR))/lag(EUR) * 100)