我想通过Profit
来计算YEAR
的百分比,这是一项相当简单的任务但不知怎的,我得到NA
。我已经检查了之前提出的相同问题,但我无法理解为什么会收到NA
。数据如下:
> df_vertical_growth
YEAR VERTICAL Profit pct_change
1 2017 AGRICULTURE 0 NA
2 2016 AGRICULTURE 2053358 NA
3 2015 AGRICULTURE 0 NA
4 2014 AGRICULTURE 2370747 NA
5 2013 AGRICULTURE 4066693 NA
6 2017 COMMUNICATION 0 NA
7 2016 COMMUNICATION 1680074 NA
8 2015 COMMUNICATION 1322470 NA
9 2014 COMMUNICATION 1460133 NA
10 2013 COMMUNICATION 1529863 NA
11 2017 CONSTRUCTION 0 NA
12 2016 CONSTRUCTION 0 NA
13 2015 CONSTRUCTION 0 NA
14 2014 CONSTRUCTION 8250149 NA
15 2013 CONSTRUCTION 0 NA
16 2017 EDUCATION 0 NA
17 2016 EDUCATION 12497015 NA
18 2015 EDUCATION 13437356 NA
19 2014 EDUCATION 10856685 NA
20 2013 EDUCATION 13881127 NA
21 2017 FINANCE, INSURANCE, REAL ESTATE 0 NA
22 2016 FINANCE, INSURANCE, REAL ESTATE 0 NA
23 2015 FINANCE, INSURANCE, REAL ESTATE 0 NA
24 2014 FINANCE, INSURANCE, REAL ESTATE 0 NA
25 2013 FINANCE, INSURANCE, REAL ESTATE 5008436 NA
26 2017 HEALTHCARE 0 NA
27 2016 HEALTHCARE 0 NA
28 2015 HEALTHCARE 0 NA
29 2014 HEALTHCARE 4554364 NA
30 2013 HEALTHCARE 5078130 NA
31 2017 HOSPITALITY 0 NA
32 2016 HOSPITALITY 4445512 NA
33 2015 HOSPITALITY 5499419 NA
34 2014 HOSPITALITY 9060639 NA
35 2013 HOSPITALITY 4391522 NA
36 2017 MANUFACTURING 0 NA
37 2016 MANUFACTURING 0 NA
38 2015 MANUFACTURING 0 NA
39 2014 MANUFACTURING 0 NA
40 2013 MANUFACTURING 27466974 NA
41 2017 MINING 0 NA
42 2016 MINING 4359251 NA
43 2015 MINING 4163201 NA
44 2014 MINING 6272530 NA
45 2013 MINING 6668191 NA
46 2017 OTHER 0 NA
47 2016 OTHER 0 NA
48 2015 OTHER 0 NA
49 2014 OTHER 5935199 NA
50 2013 OTHER 3585969 NA
51 2017 PUBLIC ADMIN 0 NA
52 2016 PUBLIC ADMIN 0 NA
53 2015 PUBLIC ADMIN 0 NA
54 2014 PUBLIC ADMIN 0 NA
55 2013 PUBLIC ADMIN 0 NA
56 2017 RETAIL TRADE 0 NA
57 2016 RETAIL TRADE 0 NA
58 2015 RETAIL TRADE 0 NA
59 2014 RETAIL TRADE 0 NA
60 2013 RETAIL TRADE 0 NA
61 2017 SERVICE 0 NA
62 2016 SERVICE 0 NA
63 2015 SERVICE 0 NA
64 2014 SERVICE 0 NA
65 2013 SERVICE 28018522 NA
66 2017 TRANSPORTATION 0 NA
67 2016 TRANSPORTATION 0 NA
68 2015 TRANSPORTATION 0 NA
69 2014 TRANSPORTATION 0 NA
70 2013 TRANSPORTATION 8430244 NA
71 2017 UTILITY 0 NA
72 2016 UTILITY 3551989 NA
73 2015 UTILITY 6535248 NA
74 2014 UTILITY 3995486 NA
75 2013 UTILITY 4477617 NA
76 2017 WHOLESALE TRADE 0 NA
77 2016 WHOLESALE TRADE 6898041 NA
78 2015 WHOLESALE TRADE 7120828 NA
79 2014 WHOLESALE TRADE 0 NA
80 2013 WHOLESALE TRADE 0 NA
我的代码:
df_vertical_growth %>% group_by(YEAR, VERTICAL) %>%
mutate(pct_change = ((Profit/lag(Profit) - 1) * 100))
现在,根据此处提供的答案How can I calculate the percentage change within a group for multiple columns in R?,还尝试执行以下操作:
pct <- function(x) {x / lag(x) - 1}
df_vertical_growth %>% group_by(YEAR, VERTICAL) %>% mutate_at(funs=pct,Profit)
但我收到以下错误:
check_dot_cols(.vars,.cols)出错:对象&#39;利润&#39;找不到
有人可以告诉我,我做错了什么?非常感谢。
答案 0 :(得分:7)
问题在于每个小组都有一个观察点。每个垂直一个独特的年份。一次观察的滞后是什么?此外,由于岁月按降序排列,我相信你需要领导。
library(tidyverse)
z %>%
group_by(VERTICAL) %>%
mutate(pct_change = (Profit/lead(Profit) - 1) * 100)
#output
YEAR VERTICAL Profit pct_change
<int> <fctr> <int> <dbl>
1 2017 AGRICULTURE 0 -100
2 2016 AGRICULTURE 2053358 Inf
3 2015 AGRICULTURE 0 -100
4 2014 AGRICULTURE 2370747 - 41.7
5 2013 AGRICULTURE 4066693 NA
6 2017 COMMUNICATION 0 -100
7 2016 COMMUNICATION 1680074 27.0
8 2015 COMMUNICATION 1322470 - 9.43
9 2014 COMMUNICATION 1460133 - 4.56
10 2013 COMMUNICATION 1529863 NA
此解决方案假设年份按正确顺序排列,以确保:
z %>%
group_by(VERTICAL) %>%
arrange(YEAR, .by_group = TRUE) %>%
mutate(pct_change = (Profit/lag(Profit) - 1) * 100)
#output
YEAR VERTICAL Profit pct_change
<int> <fctr> <int> <dbl>
1 2013 AGRICULTURE 4066693 NA
2 2014 AGRICULTURE 2370747 - 41.7
3 2015 AGRICULTURE 0 -100
4 2016 AGRICULTURE 2053358 Inf
5 2017 AGRICULTURE 0 -100
6 2013 COMMUNICATION 1529863 NA
7 2014 COMMUNICATION 1460133 - 4.56
8 2015 COMMUNICATION 1322470 - 9.43
9 2016 COMMUNICATION 1680074 27.0
10 2017 COMMUNICATION 0 -100
或使用
arrange(desc(YEAR), .by_group = TRUE)
和lead
z是:
structure(list(YEAR = c(2017L, 2016L, 2015L, 2014L, 2013L, 2017L,
2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 2013L,
2017L, 2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 2014L,
2013L, 2017L, 2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 2015L,
2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 2013L, 2017L, 2016L,
2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 2013L, 2017L,
2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 2013L
), VERTICAL = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L,
6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 9L,
9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L,
11L, 12L, 12L, 12L, 12L, 12L), .Label = c("AGRICULTURE", "COMMUNICATION",
"CONSTRUCTION", "EDUCATION", "HEALTHCARE", "HOSPITALITY", "MANUFACTURING",
"MINING", "OTHER", "SERVICE", "TRANSPORTATION", "UTILITY"), class = "factor"),
Profit = c(0L, 2053358L, 0L, 2370747L, 4066693L, 0L, 1680074L,
1322470L, 1460133L, 1529863L, 0L, 0L, 0L, 8250149L, 0L, 0L,
12497015L, 13437356L, 10856685L, 13881127L, 0L, 0L, 0L, 4554364L,
5078130L, 0L, 4445512L, 5499419L, 9060639L, 4391522L, 0L,
0L, 0L, 0L, 27466974L, 0L, 4359251L, 4163201L, 6272530L,
6668191L, 0L, 0L, 0L, 5935199L, 3585969L, 0L, 0L, 0L, 0L,
28018522L, 0L, 0L, 0L, 0L, 8430244L, 0L, 3551989L, 6535248L,
3995486L, 4477617L)), .Names = c("YEAR", "VERTICAL", "Profit"
), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9",
"10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20",
"26", "27", "28", "29", "30", "31", "32", "33", "34", "35", "36",
"37", "38", "39", "40", "41", "42", "43", "44", "45", "46", "47",
"48", "49", "50", "61", "62", "63", "64", "65", "66", "67", "68",
"69", "70", "71", "72", "73", "74", "75"), class = "data.frame")
答案 1 :(得分:0)
假设您的Profit
列代表给定年份的利润,此函数将计算年n和年n-1之间的差值,除以年n-1的值,然后乘以100得到一个百分比。如果年份n-1中的值为零,则没有有效的百分比变化。您必须仅按VERTICAL
而不是YEAR
对数据进行分组。
profit_pct_change <- function(x) {
x <- x[order(x$YEAR, decreasing = TRUE), ] # Confirms ordered by decreasing year
pct_change <- -diff(x$Profit)/x$Profit[-1] * 100 # Gets percent change in profit from preceding year
data.frame(year = x$YEAR[-length(x$YEAR)], pct_change = pct_change) # Returns data frame
}
df_vertical_growth %>%
group_by(VERTICAL) %>%
do(profit_pct_change(.))