我有一个包含不同子集和大量变量的大型数据集。
数据如下所示:
set.seed(362)
Day <- rep(seq(1:35), times = 3)
variable.1 <- round(rnorm(n = Day, mean = 1200, sd = 300),0)
variable.2 <- round(rnorm(n = Day, mean = 100, sd = 20), 0)
variable.3 <- round(rnorm(n = Day, mean = 20, sd = 5), 1)
data <- data.frame(Day, variable.1, variable.2, variable.3)
Bob <- sample("Bob", 35, replace = T)
Jeff <- sample("Jeff", 35, replace = T)
Kevin <- sample("Kevin", 35, replace = T)
Names <- array(c(Bob, Jeff, Kevin), dim = c(105,1))
data <- cbind(Names, data)
data
Names Day variable.1 variable.2 variable.3
1 Bob 1 1369 91 20.6
2 Bob 2 1155 96 18.8
3 Bob 3 999 97 22.4
4 Bob 4 947 93 11.4
5 Bob 5 1442 90 20.1
6 Bob 6 1170 125 17.8
7 Bob 7 1028 81 16.0
8 Bob 8 893 115 30.6
9 Bob 9 1413 76 18.2
10 Bob 10 1510 126 18.8
11 Bob 11 1145 117 19.7
12 Bob 12 1893 83 11.0
13 Bob 13 1559 122 21.9
14 Bob 14 1396 91 27.4
15 Bob 15 1066 105 29.2
16 Bob 16 1319 31 31.4
17 Bob 17 959 134 25.0
18 Bob 18 1325 108 11.8
19 Bob 19 1278 93 17.0
20 Bob 20 909 70 16.2
21 Bob 21 777 84 23.3
22 Bob 22 1770 105 11.6
23 Bob 23 1080 79 14.6
24 Bob 24 855 70 18.7
25 Bob 25 1192 84 15.1
26 Bob 26 1077 116 18.6
27 Bob 27 1376 120 19.6
28 Bob 28 1290 107 20.8
29 Bob 29 1150 96 16.4
30 Bob 30 991 111 22.0
31 Bob 31 1433 113 16.0
32 Bob 32 1125 104 17.8
33 Bob 33 1076 122 21.6
34 Bob 34 1491 113 24.1
35 Bob 35 1163 102 20.0
36 Jeff 1 1151 75 19.4
37 Jeff 2 1375 87 24.7
38 Jeff 3 1508 106 19.1
39 Jeff 4 1569 84 15.4
40 Jeff 5 1279 88 13.5
41 Jeff 6 664 116 21.1
42 Jeff 7 987 69 24.7
43 Jeff 8 1913 121 20.6
44 Jeff 9 1320 99 17.4
45 Jeff 10 1384 126 25.6
46 Jeff 11 1067 118 22.6
47 Jeff 12 1060 81 20.9
48 Jeff 13 1732 97 19.5
49 Jeff 14 1097 112 17.1
50 Jeff 15 1521 105 13.4
51 Jeff 16 1139 123 19.0
52 Jeff 17 996 99 20.8
53 Jeff 18 713 127 27.0
54 Jeff 19 1586 91 15.4
55 Jeff 20 777 119 17.7
56 Jeff 21 1232 106 22.4
57 Jeff 22 1415 116 25.9
58 Jeff 23 1256 117 23.1
59 Jeff 24 955 97 24.3
60 Jeff 25 1503 105 21.1
61 Jeff 26 1965 80 17.6
62 Jeff 27 1281 112 29.7
63 Jeff 28 1467 122 23.8
64 Jeff 29 939 118 17.4
65 Jeff 30 1288 91 20.9
66 Jeff 31 1441 99 16.6
67 Jeff 32 1310 75 23.2
68 Jeff 33 1155 112 22.9
69 Jeff 34 1357 94 29.0
70 Jeff 35 1378 81 26.6
71 Kevin 1 1185 70 16.8
72 Kevin 2 1709 115 24.2
73 Kevin 3 1050 111 10.5
74 Kevin 4 1474 104 10.7
75 Kevin 5 1016 75 20.4
76 Kevin 6 630 98 18.6
77 Kevin 7 949 53 16.4
78 Kevin 8 1284 118 16.6
79 Kevin 9 1255 87 7.7
80 Kevin 10 406 105 14.1
81 Kevin 11 1182 110 16.8
82 Kevin 12 803 73 27.7
83 Kevin 13 960 84 20.3
84 Kevin 14 1192 91 20.6
85 Kevin 15 749 104 22.2
86 Kevin 16 848 106 17.2
87 Kevin 17 1567 77 12.4
88 Kevin 18 1026 127 19.3
89 Kevin 19 1384 93 19.4
90 Kevin 20 1024 96 17.1
91 Kevin 21 1226 105 12.6
92 Kevin 22 1629 110 10.1
93 Kevin 23 1197 64 24.1
94 Kevin 24 1286 82 17.0
95 Kevin 25 1104 103 26.2
96 Kevin 26 1056 108 25.1
97 Kevin 27 1481 145 10.7
98 Kevin 28 949 124 18.8
99 Kevin 29 1230 152 13.5
100 Kevin 30 1481 78 15.4
101 Kevin 31 1437 83 25.0
102 Kevin 32 1446 81 21.3
103 Kevin 33 1501 101 20.0
104 Kevin 34 1288 103 17.8
105 Kevin 35 1338 109 25.0
我为&#34;变量1和#34;创建了每个主题的百分比变化。使用Delt
包中的quantmod
函数并将其应用于ddply
包中的plyr
函数(按主题设置条件):
library(quantmod)
library(reshape2)
data <- ddply(data, "Names", transform,
variable.1.Percent.Change = round(Delt(variable.1)*100,1))
Names Day variable.1 variable.2 variable.3 Delt.1.arithmetic
1 Bob 1 1369 91 20.6 NA
2 Bob 2 1155 96 18.8 -15.6
3 Bob 3 999 97 22.4 -13.5
4 Bob 4 947 93 11.4 -5.2
5 Bob 5 1442 90 20.1 52.3
6 Bob 6 1170 125 17.8 -18.9
7 Bob 7 1028 81 16.0 -12.1
8 Bob 8 893 115 30.6 -13.1
9 Bob 9 1413 76 18.2 58.2
10 Bob 10 1510 126 18.8 6.9
11 Bob 11 1145 117 19.7 -24.2
12 Bob 12 1893 83 11.0 65.3
13 Bob 13 1559 122 21.9 -17.6
14 Bob 14 1396 91 27.4 -10.5
15 Bob 15 1066 105 29.2 -23.6
16 Bob 16 1319 31 31.4 23.7
17 Bob 17 959 134 25.0 -27.3
18 Bob 18 1325 108 11.8 38.2
19 Bob 19 1278 93 17.0 -3.5
20 Bob 20 909 70 16.2 -28.9
21 Bob 21 777 84 23.3 -14.5
22 Bob 22 1770 105 11.6 127.8
23 Bob 23 1080 79 14.6 -39.0
24 Bob 24 855 70 18.7 -20.8
25 Bob 25 1192 84 15.1 39.4
26 Bob 26 1077 116 18.6 -9.6
27 Bob 27 1376 120 19.6 27.8
28 Bob 28 1290 107 20.8 -6.2
29 Bob 29 1150 96 16.4 -10.9
30 Bob 30 991 111 22.0 -13.8
31 Bob 31 1433 113 16.0 44.6
32 Bob 32 1125 104 17.8 -21.5
33 Bob 33 1076 122 21.6 -4.4
34 Bob 34 1491 113 24.1 38.6
35 Bob 35 1163 102 20.0 -22.0
36 Jeff 1 1151 75 19.4 NA
37 Jeff 2 1375 87 24.7 19.5
38 Jeff 3 1508 106 19.1 9.7
39 Jeff 4 1569 84 15.4 4.0
40 Jeff 5 1279 88 13.5 -18.5
41 Jeff 6 664 116 21.1 -48.1
42 Jeff 7 987 69 24.7 48.6
43 Jeff 8 1913 121 20.6 93.8
44 Jeff 9 1320 99 17.4 -31.0
45 Jeff 10 1384 126 25.6 4.8
46 Jeff 11 1067 118 22.6 -22.9
47 Jeff 12 1060 81 20.9 -0.7
48 Jeff 13 1732 97 19.5 63.4
49 Jeff 14 1097 112 17.1 -36.7
50 Jeff 15 1521 105 13.4 38.7
51 Jeff 16 1139 123 19.0 -25.1
52 Jeff 17 996 99 20.8 -12.6
53 Jeff 18 713 127 27.0 -28.4
54 Jeff 19 1586 91 15.4 122.4
55 Jeff 20 777 119 17.7 -51.0
56 Jeff 21 1232 106 22.4 58.6
57 Jeff 22 1415 116 25.9 14.9
58 Jeff 23 1256 117 23.1 -11.2
59 Jeff 24 955 97 24.3 -24.0
60 Jeff 25 1503 105 21.1 57.4
61 Jeff 26 1965 80 17.6 30.7
62 Jeff 27 1281 112 29.7 -34.8
63 Jeff 28 1467 122 23.8 14.5
64 Jeff 29 939 118 17.4 -36.0
65 Jeff 30 1288 91 20.9 37.2
66 Jeff 31 1441 99 16.6 11.9
67 Jeff 32 1310 75 23.2 -9.1
68 Jeff 33 1155 112 22.9 -11.8
69 Jeff 34 1357 94 29.0 17.5
70 Jeff 35 1378 81 26.6 1.5
71 Kevin 1 1185 70 16.8 NA
72 Kevin 2 1709 115 24.2 44.2
73 Kevin 3 1050 111 10.5 -38.6
74 Kevin 4 1474 104 10.7 40.4
75 Kevin 5 1016 75 20.4 -31.1
76 Kevin 6 630 98 18.6 -38.0
77 Kevin 7 949 53 16.4 50.6
78 Kevin 8 1284 118 16.6 35.3
79 Kevin 9 1255 87 7.7 -2.3
80 Kevin 10 406 105 14.1 -67.6
81 Kevin 11 1182 110 16.8 191.1
82 Kevin 12 803 73 27.7 -32.1
83 Kevin 13 960 84 20.3 19.6
84 Kevin 14 1192 91 20.6 24.2
85 Kevin 15 749 104 22.2 -37.2
86 Kevin 16 848 106 17.2 13.2
87 Kevin 17 1567 77 12.4 84.8
88 Kevin 18 1026 127 19.3 -34.5
89 Kevin 19 1384 93 19.4 34.9
90 Kevin 20 1024 96 17.1 -26.0
91 Kevin 21 1226 105 12.6 19.7
92 Kevin 22 1629 110 10.1 32.9
93 Kevin 23 1197 64 24.1 -26.5
94 Kevin 24 1286 82 17.0 7.4
95 Kevin 25 1104 103 26.2 -14.2
96 Kevin 26 1056 108 25.1 -4.3
97 Kevin 27 1481 145 10.7 40.2
98 Kevin 28 949 124 18.8 -35.9
99 Kevin 29 1230 152 13.5 29.6
100 Kevin 30 1481 78 15.4 20.4
101 Kevin 31 1437 83 25.0 -3.0
102 Kevin 32 1446 81 21.3 0.6
103 Kevin 33 1501 101 20.0 3.8
104 Kevin 34 1288 103 17.8 -14.2
105 Kevin 35 1338 109 25.0 3.9
我想要做的是创建另一个列,而不是让百分比更改数据总和高于它的所有百分比变化。我尝试过使用滞后函数和求和来做一些事情,但它只是将日期变化的百分比相加。我希望百分比变化能够不断总结。例如,对于第一个主题,Bob,他的百分比变化是:
NA
-15.6
-13.5
-5.2
52.3
等
我想在旁边的一栏中说
Percent Change Percent Addition
NA NA
-15.6 -15.6
-13.5 -29.1
-5.2 -34.3
52.3 18
我似乎无法找出实现这一目标的最佳方式,然后将其应用于一大组主题(就像我改变百分比一样)。
另外,作为旁注,有人知道为什么当我应用Delt
函数时,它命名列Delt.1.arithmetic而不是我设置的列名,我无法进行共同名(数据)并出于某种原因将其重命名?
答案 0 :(得分:0)
请尝试:
data$Delt.1.arithmetic[is.na(data$Delt.1.arithmetic)] <- 0
percent_addition <- aggregate(x = data$Delt.1.arithmetic, by = list(data$Names), FUN = cumsum)
percent_addition <- percent_addition[, -1]
percent_addition <- c(t(x = percent_addition))
data$Delt.1.arithmetic[data$Delt.1.arithmetic == 0] <- NA
percent_addition[percent_addition == 0] <- NA
data <- cbind(data, percent_addition)
当我将其应用于您的数据框时,上述代码有效。需要注意的是,它假设NAs和0是可互换的。