R计算对应于另一列的每个箱的一列的平均值

时间:2013-08-21 18:06:48

标签: r average binning discretization

我有这两个列的数据。如图所示,数据噪音太大。因此,我希望将大小为5的列“r”离散化,并将每一行分配到其对应的bin,然后计算每个bin的平均值。

> dr
           r      f
1   65.06919 21.796
2   62.36986 22.836
3   59.81639 22.980
4   57.42822 22.061
5   55.22681 21.012
6   53.23533 21.274
7   51.47815 21.594
8   49.98000 22.117
9   48.76474 20.366
10  47.85394 18.991
11  47.26521 20.920
12  47.01064 20.161
13  47.09565 22.328
14  47.51842 19.610
15  48.27007 18.615
16  49.33559 21.753
17  50.69517 22.754
18  52.32590 22.096
19  54.20332 22.020
20  56.30275 22.111
21  58.60034 21.395
22  61.07373 22.635
23  63.70243 22.128
24  66.46804 21.698
25  62.24147 21.879
26  59.41380 21.637
27  56.72742 21.991
28  54.20332 21.535
29  51.86521 21.093
30  49.73932 20.496
31  47.85394 21.737
32  46.23851 21.890
33  44.92215 21.236
34  43.93177 19.997
35  43.28972 19.661
36  43.01163 20.692
37  43.10452 19.663
38  43.56604 19.273
39  44.38468 20.743
40  45.54119 22.604
41  47.01064 22.167
42  48.76474 20.427
43  50.77401 21.543
44  53.00943 21.391
45  55.44367 21.313
46  58.05170 22.501
47  60.81118 22.414
48  63.70243 22.920
49  59.54830 21.571
50  56.58622 22.454
51  53.75872 22.643
52  51.08816 20.219
53  48.60041 20.300
54  46.32494 19.832
55  44.29447 20.284
56  42.54409 21.284
57  41.10961 21.350
58  40.02499 20.784
59  39.31921 20.383
60  39.01282 20.508
61  39.11521 19.413
62  39.62323 20.043
63  40.52160 18.583
64  41.78516 19.512
65  43.38202 20.849
66  45.27693 21.349
67  47.43416 20.734
68  49.81967 22.055
69  52.40229 22.108
70  55.15433 23.184
71  58.05170 23.147
72  61.07373 23.207
73  57.00877 21.467
74  53.90733 21.549
75  50.93133 23.035
76  48.10405 20.684
77  45.45327 20.189
78  43.01163 19.304
79  40.81666 19.739
80  38.91015 20.976
81  37.33631 21.305
82  36.13862 21.319
83  35.35534 20.133
84  35.01428 20.179
85  35.12834 20.634
86  35.69314 22.478
87  36.68787 21.608
88  38.07887 20.964
89  39.82462 18.409
90  41.88078 20.627
91  44.20407 20.980
92  46.75468 22.206
93  49.49747 21.828
94  52.40229 20.844
95  55.44367 21.619
96  58.60034 21.498
97  54.64430 19.433
98  51.40039 21.293
99  48.27007 20.687
100 45.27693 21.377
101 42.44997 21.282
102 39.82462 20.910
103 37.44329 18.810
104 35.35534 21.223
105 33.61547 20.197
106 32.28002 20.765
107 31.40064 19.781
108 31.01612 20.536
109 31.14482 21.245
110 31.78050 21.117
111 32.89377 20.303
112 34.43835 20.795
113 36.35932 20.754
114 38.60052 21.025
115 41.10961 20.924
116 43.84062 21.475
117 46.75468 21.435
118 49.81967 20.380
119 53.00943 21.590
120 56.30275 20.743
121 52.47857 20.600
122 49.09175 20.818
123 45.80393 21.514
124 42.63801 21.922
125 39.62323 21.469
126 36.79674 22.186
127 34.20526 19.625
128 31.90611 19.703
129 29.96665 18.793
130 28.46050 18.912
131 27.45906 19.239
132 27.01851 18.467
133 27.16616 18.974
134 27.89265 20.090
135 29.15476 19.155
136 30.88689 20.526
137 33.01515 20.273
138 35.46830 19.956
139 38.18377 21.547
140 41.10961 21.260
141 44.20407 20.802
142 47.43416 19.719
143 50.77401 21.645
144 54.20332 18.957
145 50.53712 21.410
146 47.01064 20.536
147 43.56604 20.963
148 40.22437 20.775
149 37.01351 22.257
150 33.97058 21.868
151 31.14482 18.907
152 28.60070 19.644
153 26.41969 17.694
154 24.69818 17.883
155 23.53720 17.975
156 23.02173 18.778
157 23.19483 18.896
158 24.04163 19.561
159 25.49510 20.137
160 27.45906 19.922
161 29.83287 19.574
162 32.52691 19.029
163 35.46830 20.356
164 38.60052 20.330
165 41.88078 20.005
166 45.27693 20.006
167 48.76474 21.056
168 52.32590 20.143
169 48.84670 22.094
170 45.18849 21.252
171 41.59327 22.023
172 38.07887 21.563
173 34.66987 21.408
174 31.40064 21.334
175 28.31960 19.855
176 25.49510 18.648
177 23.02173 17.397
178 21.02380 17.311
179 19.64688 16.714
180 19.02630 18.152
181 19.23538 18.187
182 20.24846 19.910
183 21.95450 20.451
184 24.20744 19.820
185 26.87006 19.862
186 29.83287 19.987
187 33.01515 19.363
188 36.35932 19.498
189 39.82462 19.121
190 43.38202 20.479
191 47.01064 20.311
192 50.69517 21.666
193 47.43416 21.995
194 43.65776 23.158
195 39.92493 24.632
196 36.24914 23.273
197 32.64966 22.535
198 29.15476 19.933
199 25.80698 18.277
200 22.67157 16.169

enter image description here

因此,为了完成该过程,从第1行开始查看每一行将被分配到bin [65-70],第2行将在[60-65]中开启......

然后对于最终结果,我想要每个bin的中间点及其f值的平均值。 S,我可以根据f(r)

绘制f线

2 个答案:

答案 0 :(得分:5)

正如@Fernando在评论中已提到的那样,您可以尝试cut(分箱)和tapply

tapply(df$f, cut(df$r, seq(15, 70, by=5)), mean)
# (15,20]  (20,25]  (25,30]  (30,35]  (35,40]  (40,45]  (45,50]  (50,55]  (55,60]  (60,65]  (65,70] 
#17.68433 18.55918 19.28683 20.49000 20.87942 20.65430 20.96155 21.35146 21.92259 22.57414 21.74700 

答案 1 :(得分:2)

或者,您可以使用精彩的plyr包。

library(plyr)
ddply(df, .(cut(df$r, 5)), colwise(mean))

但是,如果您不得不提出上述问题,那么您可以使用tapply解决方案。