组合多个数据帧并计算平均值

时间:2015-07-16 22:10:59

标签: r dataframe mean

我有三个数据框如下。我想根据Lon& amp;将它们组合成一个数据帧。 Lat,并平均每个'单元'的3个值。我已经读过这个(calculate average over multiple data frames)并试图利用聚合但无济于事....任何指针都赞赏。

> head(CSR.GRACE[,c(1:14)],10)
    Lon  Lat   January  February     March     April       May     June        July     August  September   October    November  December
1  28.5 -4.5 17.710425 13.855327 12.385712 13.558101 12.789865 6.913783  1.03770075 -5.3901741 -6.6351015 -7.661375 -3.09337944 6.0659410
2  29.5 -4.5 14.010154 10.257435  9.009641 10.275778  9.598241 5.166972  0.73570247 -4.2733162 -5.0861417 -5.850192 -2.93521806 4.1240150
3  30.5 -4.5 16.288443 10.467614  9.275714 10.904162 10.228808 5.364853  0.50089883 -4.7478741 -5.4320069 -6.316568 -3.80160315 3.8494745
4  31.5 -4.5 18.560677  9.932461  9.239592 11.037748 10.551886 5.281853  0.01181973 -4.9034324 -5.3504391 -6.438050 -4.41695714 3.3432301
5  32.5 -4.5 10.171202  4.476512  4.509140  5.448872  5.338991 2.556262 -0.22646611 -2.3274204 -2.4376636 -3.103697 -2.27586145 1.3641930
6  33.5 -4.5 14.040068  5.349344  5.772618  7.158792  7.121341 3.407587 -0.30616689 -2.6800099 -2.7955420 -3.803622 -2.77898997 1.4021380

> head(GFZ.GRACE[,c(1:14)],10)
    Lon  Lat   January  February     March     April       May      June     July     August September   October   November  December
1  28.5 -4.5 15.642782 15.521720 11.823875 19.825865 17.335761 11.208188 5.080615 -3.0897644 -5.733351 -4.196604 -1.6697661 10.744696
2  29.5 -4.5 12.164074 10.931418  8.622238 15.341911 12.969769  8.521280 4.072790 -2.4301791 -4.551170 -3.055914 -1.2260079  7.592880
3  30.5 -4.5 13.579305 10.267520  8.787406 16.567715 13.745143  9.121496 4.497849 -2.6723491 -5.022949 -3.269881 -1.0691039  7.377143
4  31.5 -4.5 14.501465  8.600480  8.259757 16.981533 14.054429  9.318550 4.582672 -2.7917893 -5.249895 -3.636936 -0.5141342  6.770836
5  32.5 -4.5  7.311216  3.249596  3.513870  8.430777  6.941659  4.572560 2.203461 -1.4106516 -2.661226 -2.113089  0.2459282  3.049897
6  33.5 -4.5  9.121348  3.113245  3.584976 11.040761  8.732950  5.772059 2.811168 -1.8554437 -3.524447 -3.272863  1.2493973  3.750694

> head(JPL.GRACE[,c(1:14)],10)
    Lon  Lat   January  February     March     April       May     June     July     August  September    October   November   December
1  28.5 -4.5 19.559790 14.544438 12.035112 13.944141 11.931011 7.513007 3.095003 -3.6165702 -6.5945043 -7.2498567 -4.5402436  6.3935236
2  29.5 -4.5 15.740160 11.192191  8.549782 10.783359  9.401173 5.834498 2.267822 -2.6354346 -4.8939197 -5.5912996 -3.7295148  4.1461123
3  30.5 -4.5 18.984714 12.014807  8.510139 11.628697 10.635699 6.448064 2.260429 -2.6979695 -5.2102337 -6.2646164 -4.2713238  3.5089825
4  31.5 -4.5 22.794356 11.993054  8.162500 11.813746 11.747350 6.955983 2.164615 -2.5707902 -5.3448873 -6.7473006 -4.5777496  2.5609555
5  32.5 -4.5 13.233634  5.606305  3.880347  5.753024  6.388978 3.742596 1.096214 -1.1103189 -2.6367831 -3.4102675 -2.2860237  0.7826054
6  33.5 -4.5 19.260989  6.761722  4.978247  7.373498  9.135645 5.421030 1.706414 -1.0796434 -3.3122886 -4.2114588 -2.8110246  0.4825075

5 个答案:

答案 0 :(得分:9)

你可以这样做:

library(data.table)

rbindlist(list(JPL.GRACE,GFZ.GRACE,CSR.GRACE))[,lapply(.SD,mean), list(Lon, Lat)]

<强>说明

data.frames list被置于rbindlist和&#39;叠加的地平线&#39;使用data.table(返回data.frame)。我们这样做是因为您的do.call(rbind, list(JPL.GRACE,GFZ.GRACE,CSR.GRACE))具有相同的结构(相同的列数和名称,相同类型的数据)。 另一种方法是Lon, Lat

然后我们遍历每个不同的.SD对。 data.table表示与每对关联的dt = rbindlist(list(JPL.GRACE,GFZ.GRACE,CSR.GRACE)) dt[,print(.SD), list(Lon, Lat)] 。你可以这样做:

.SD

对于这些s_dict中的每一个,我们只需循环遍历列并计算均值。

答案 1 :(得分:2)

由于您的数据全部为数字,因此您可以将其放入3d数组并使用rowMeans

library(abind)

arr = abind(JPL.GRACE, GFZ.GRACE, CSR.GRACE, along = 3)
rowMeans(arr, dims = 2)

答案 2 :(得分:2)

使用1:2作为&#34; MARGIN&#34;

,使用3宽阵列可以非常轻松地完成此操作
library(abind)
temp_array <- abind(CSR.GRACE, GFZ.GRACE, JPL.GRACE, along=3)
res <- apply(temp_array, 1:2, mean)

这是一个简单的例子:

 install.packages('abind')
 x <- matrix(1:12,3,4)
 y <- x+100; z= y-50
 apply( abind::abind(x,y,z, along=3),  1:2, mean)
     [,1] [,2] [,3] [,4]
[1,]   51   54   57   60
[2,]   52   55   58   61
[3,]   53   56   59   62

答案 3 :(得分:1)

也许sqldf package可以帮到你:

library(sqldf)
df1 <- CSR.GRACE[,c(1:14)]
df2 <- GFZ.GRACE[,c(1:14)]
df3 <- JPL.GRACE[,c(1:14)]
# This could be done with rbind(), but I'll use sqldf()
# I'm assuming all data frames have the same columns:
df <- sqldf('select * from df1 
             union all select * from df2 
             union all select * from df3')
# The average can be done also with sqldf (just a demo)
sqldf('select Lon, Lat, avg(January) as jan, avg(February) as feb
       from df
       group by Lon, Lat')

可能有更好的解决方案,但这是一种简单的方法。

希望这有帮助

答案 4 :(得分:0)

这是一个基本的 R 解决方案:

r <- do.call(rbind, list(JPL.GRACE,GFZ.GRACE,CSR.GRACE))
r <- aggregate(r[-c(1,2)], list(r$Lon, r$Lat), mean)