使用dplyr计算R中的百分比变化

时间:2018-01-10 21:41:40

标签: r dplyr

我想通过Profit来计算YEAR的百分比,这是一项相当简单的任务但不知怎的,我得到NA。我已经检查了之前提出的相同问题,但我无法理解为什么会收到NA。数据如下:

> df_vertical_growth
   YEAR                        VERTICAL   Profit pct_change
1  2017                     AGRICULTURE        0         NA
2  2016                     AGRICULTURE  2053358         NA
3  2015                     AGRICULTURE        0         NA
4  2014                     AGRICULTURE  2370747         NA
5  2013                     AGRICULTURE  4066693         NA
6  2017                   COMMUNICATION        0         NA
7  2016                   COMMUNICATION  1680074         NA
8  2015                   COMMUNICATION  1322470         NA
9  2014                   COMMUNICATION  1460133         NA
10 2013                   COMMUNICATION  1529863         NA
11 2017                    CONSTRUCTION        0         NA
12 2016                    CONSTRUCTION        0         NA
13 2015                    CONSTRUCTION        0         NA
14 2014                    CONSTRUCTION  8250149         NA
15 2013                    CONSTRUCTION        0         NA
16 2017                       EDUCATION        0         NA
17 2016                       EDUCATION 12497015         NA
18 2015                       EDUCATION 13437356         NA
19 2014                       EDUCATION 10856685         NA
20 2013                       EDUCATION 13881127         NA
21 2017 FINANCE, INSURANCE, REAL ESTATE        0         NA
22 2016 FINANCE, INSURANCE, REAL ESTATE        0         NA
23 2015 FINANCE, INSURANCE, REAL ESTATE        0         NA
24 2014 FINANCE, INSURANCE, REAL ESTATE        0         NA
25 2013 FINANCE, INSURANCE, REAL ESTATE  5008436         NA
26 2017                      HEALTHCARE        0         NA
27 2016                      HEALTHCARE        0         NA
28 2015                      HEALTHCARE        0         NA
29 2014                      HEALTHCARE  4554364         NA
30 2013                      HEALTHCARE  5078130         NA
31 2017                     HOSPITALITY        0         NA
32 2016                     HOSPITALITY  4445512         NA
33 2015                     HOSPITALITY  5499419         NA
34 2014                     HOSPITALITY  9060639         NA
35 2013                     HOSPITALITY  4391522         NA
36 2017                   MANUFACTURING        0         NA
37 2016                   MANUFACTURING        0         NA
38 2015                   MANUFACTURING        0         NA
39 2014                   MANUFACTURING        0         NA
40 2013                   MANUFACTURING 27466974         NA
41 2017                          MINING        0         NA
42 2016                          MINING  4359251         NA
43 2015                          MINING  4163201         NA
44 2014                          MINING  6272530         NA
45 2013                          MINING  6668191         NA
46 2017                           OTHER        0         NA
47 2016                           OTHER        0         NA
48 2015                           OTHER        0         NA
49 2014                           OTHER  5935199         NA
50 2013                           OTHER  3585969         NA
51 2017                    PUBLIC ADMIN        0         NA
52 2016                    PUBLIC ADMIN        0         NA
53 2015                    PUBLIC ADMIN        0         NA
54 2014                    PUBLIC ADMIN        0         NA
55 2013                    PUBLIC ADMIN        0         NA
56 2017                    RETAIL TRADE        0         NA
57 2016                    RETAIL TRADE        0         NA
58 2015                    RETAIL TRADE        0         NA
59 2014                    RETAIL TRADE        0         NA
60 2013                    RETAIL TRADE        0         NA
61 2017                         SERVICE        0         NA
62 2016                         SERVICE        0         NA
63 2015                         SERVICE        0         NA
64 2014                         SERVICE        0         NA
65 2013                         SERVICE 28018522         NA
66 2017                  TRANSPORTATION        0         NA
67 2016                  TRANSPORTATION        0         NA
68 2015                  TRANSPORTATION        0         NA
69 2014                  TRANSPORTATION        0         NA
70 2013                  TRANSPORTATION  8430244         NA
71 2017                         UTILITY        0         NA
72 2016                         UTILITY  3551989         NA
73 2015                         UTILITY  6535248         NA
74 2014                         UTILITY  3995486         NA
75 2013                         UTILITY  4477617         NA
76 2017                 WHOLESALE TRADE        0         NA
77 2016                 WHOLESALE TRADE  6898041         NA
78 2015                 WHOLESALE TRADE  7120828         NA
79 2014                 WHOLESALE TRADE        0         NA
80 2013                 WHOLESALE TRADE        0         NA

我的代码:

df_vertical_growth %>% group_by(YEAR, VERTICAL) %>% 
     mutate(pct_change = ((Profit/lag(Profit) - 1) * 100))

现在,根据此处提供的答案How can I calculate the percentage change within a group for multiple columns in R?,还尝试执行以下操作:

pct <- function(x) {x / lag(x) - 1}
df_vertical_growth %>% group_by(YEAR, VERTICAL) %>% mutate_at(funs=pct,Profit)

但我收到以下错误:

  

check_dot_cols(.vars,.cols)出错:对象&#39;利润&#39;找不到

有人可以告诉我,我做错了什么?非常感谢。

2 个答案:

答案 0 :(得分:7)

问题在于每个小组都有一个观察点。每个垂直一个独特的年份。一次观察的滞后是什么?此外,由于岁月按降序排列,我相信你需要领导。

library(tidyverse)
z %>%
  group_by(VERTICAL) %>% 
  mutate(pct_change = (Profit/lead(Profit) - 1) * 100)
#output
    YEAR VERTICAL       Profit pct_change
   <int> <fctr>          <int>      <dbl>
 1  2017 AGRICULTURE         0    -100   
 2  2016 AGRICULTURE   2053358     Inf   
 3  2015 AGRICULTURE         0    -100   
 4  2014 AGRICULTURE   2370747    - 41.7 
 5  2013 AGRICULTURE   4066693      NA   
 6  2017 COMMUNICATION       0    -100   
 7  2016 COMMUNICATION 1680074      27.0 
 8  2015 COMMUNICATION 1322470    -  9.43
 9  2014 COMMUNICATION 1460133    -  4.56
10  2013 COMMUNICATION 1529863      NA   

此解决方案假设年份按正确顺序排列,以确保:

z %>%
  group_by(VERTICAL) %>% 
  arrange(YEAR, .by_group = TRUE) %>%
  mutate(pct_change = (Profit/lag(Profit) - 1) * 100)
#output
    YEAR VERTICAL       Profit pct_change
   <int> <fctr>          <int>      <dbl>
 1  2013 AGRICULTURE   4066693      NA   
 2  2014 AGRICULTURE   2370747    - 41.7 
 3  2015 AGRICULTURE         0    -100   
 4  2016 AGRICULTURE   2053358     Inf   
 5  2017 AGRICULTURE         0    -100   
 6  2013 COMMUNICATION 1529863      NA   
 7  2014 COMMUNICATION 1460133    -  4.56
 8  2015 COMMUNICATION 1322470    -  9.43
 9  2016 COMMUNICATION 1680074      27.0 
10  2017 COMMUNICATION       0    -100   

或使用

arrange(desc(YEAR), .by_group = TRUE)

lead

z是:

structure(list(YEAR = c(2017L, 2016L, 2015L, 2014L, 2013L, 2017L, 
2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 2013L, 
2017L, 2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 
2013L, 2017L, 2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 
2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 
2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 2013L, 2017L, 
2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 2013L
), VERTICAL = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 
6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 9L, 
9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 
11L, 12L, 12L, 12L, 12L, 12L), .Label = c("AGRICULTURE", "COMMUNICATION", 
"CONSTRUCTION", "EDUCATION", "HEALTHCARE", "HOSPITALITY", "MANUFACTURING", 
"MINING", "OTHER", "SERVICE", "TRANSPORTATION", "UTILITY"), class = "factor"), 
    Profit = c(0L, 2053358L, 0L, 2370747L, 4066693L, 0L, 1680074L, 
    1322470L, 1460133L, 1529863L, 0L, 0L, 0L, 8250149L, 0L, 0L, 
    12497015L, 13437356L, 10856685L, 13881127L, 0L, 0L, 0L, 4554364L, 
    5078130L, 0L, 4445512L, 5499419L, 9060639L, 4391522L, 0L, 
    0L, 0L, 0L, 27466974L, 0L, 4359251L, 4163201L, 6272530L, 
    6668191L, 0L, 0L, 0L, 5935199L, 3585969L, 0L, 0L, 0L, 0L, 
    28018522L, 0L, 0L, 0L, 0L, 8430244L, 0L, 3551989L, 6535248L, 
    3995486L, 4477617L)), .Names = c("YEAR", "VERTICAL", "Profit"
), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", 
"10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", 
"26", "27", "28", "29", "30", "31", "32", "33", "34", "35", "36", 
"37", "38", "39", "40", "41", "42", "43", "44", "45", "46", "47", 
"48", "49", "50", "61", "62", "63", "64", "65", "66", "67", "68", 
"69", "70", "71", "72", "73", "74", "75"), class = "data.frame")

答案 1 :(得分:0)

假设您的Profit列代表给定年份的利润,此函数将计算年n和年n-1之间的差值,除以年n-1的值,然后乘以100得到一个百分比。如果年份n-1中的值为零,则没有有效的百分比变化。您必须仅按VERTICAL而不是YEAR对数据进行分组。

profit_pct_change <- function(x) {
  x <- x[order(x$YEAR, decreasing = TRUE), ] # Confirms ordered by decreasing year
  pct_change <- -diff(x$Profit)/x$Profit[-1] * 100 # Gets percent change in profit from preceding year
  data.frame(year = x$YEAR[-length(x$YEAR)], pct_change = pct_change) # Returns data frame
}

df_vertical_growth %>% 
  group_by(VERTICAL) %>%
  do(profit_pct_change(.))