R中按名称或行号的多行平均值

时间:2014-10-15 08:24:25

标签: r

R中的数据集如下所示:

             LD.D        LD.L            LD.P
Y.1992.a1   67.89552605 33.21192862 90.7750688
Y.1992.a2   227.1370541 79.67211036 154.5165077
Y.1992.a3   94.5326718  24.72816922 151.665545
Y.1992.a4   106.8793485 56.07635245 100.6711004
Y.1992.a5   97.41402289 46.93434073 100.8787496
Y.1993.a1   150.045093  19.64290196 27.81953228
Y.1993.a2   106.5888189 21.38886866 84.82532249
Y.1993.a3   110.7493543 25.41765759 70.02222315
Y.1993.a4   237.1246502 16.43006029 75.17407065
Y.1993.a5   234.5403261 16.93082727 49.01639754
Y.1994.a1   94.5326718  24.72816922 151.665545
Y.1994.a2   106.8793485 56.07635245 100.6711004
Y.1994.a3   97.41402289 46.93434073 100.8787496
Y.1994.a4   150.045093  19.64290196 27.81953228
Y.1994.a5   106.5888189 21.38886866 84.82532249

每年我都有五次重复。问题是我怎样才能获得每一年的平均值(例如1992年和1993年以及1994年)?

2 个答案:

答案 0 :(得分:3)

您可以使用base R或使用dplyrdata.table等专门软件包执行此操作(当数据集非常大时效率更高)。

df$Year <- gsub("^.\\.(\\d+)\\..*", "\\1", row.names(df)) #extracted the year alone from the row names and created a column `Year` in the dataset 
library(dplyr)
 df %>% 
    group_by(Year) %>% #grouped by Year variable
    summarise_each(funs(mean=mean(., na.rm=TRUE))) #when you specify the function, `summarise_each will applies the function (here it is mean) to each of the columns in the dataset or a subset of columns (if specified) 

 #    Source: local data frame [3 x 4]

 #  Year     LD.D     LD.L      LD.P
 #1 1992 118.7717 48.12458 119.70139
 #2 1993 167.8096 19.96206  61.37151
 #3 1994 111.0920 33.75413  93.17205

使用data.table。使用data.table转换为setDT,并使用lapply S ata.table(D)列的.SD mean获取by 1}}。使用Year指定分组变量 library(data.table) setDT(df)[, lapply(.SD, mean, na.rm=TRUE), by=Year] # Year LD.D LD.L LD.P #1: 1992 118.7717 48.12458 119.70139 #2: 1993 167.8096 19.96206 61.37151 #3: 1994 111.0920 33.75413 93.17205

base R

或使用aggregatebysplitby等有不同的方式。这里有一个regex。使用Year(lookbehind)获取Y。在这种情况下,我也会获得 Year <- gsub("(?<=[0-9])\\..*$", "", row.names(df), perl=TRUE) do.call(`rbind`,by(df, Year, FUN= colMeans, na.rm=TRUE)) # LD.D LD.L LD.P #Y.1992 118.7717 48.12458 119.70139 #Y.1993 167.8096 19.96206 61.37151 #Y.1994 111.0920 33.75413 93.17205 前缀,因为它不会影响结果。

 df <- structure(list(LD.D = c(67.89552605, 227.1370541, 94.5326718, 
 106.8793485, 97.41402289, 150.045093, 106.5888189, 110.7493543, 
 237.1246502, 234.5403261, 94.5326718, 106.8793485, 97.41402289, 
 150.045093, 106.5888189), LD.L = c(33.21192862, 79.67211036, 
 24.72816922, 56.07635245, 46.93434073, 19.64290196, 21.38886866, 
 25.41765759, 16.43006029, 16.93082727, 24.72816922, 56.07635245, 
 46.93434073, 19.64290196, 21.38886866), LD.P = c(90.7750688, 
 154.5165077, 151.665545, 100.6711004, 100.8787496, 27.81953228, 
 84.82532249, 70.02222315, 75.17407065, 49.01639754, 151.665545, 
 100.6711004, 100.8787496, 27.81953228, 84.82532249)), .Names = c("LD.D", 
 "LD.L", "LD.P"), class = "data.frame", row.names = c("Y.1992.a1", 
 "Y.1992.a2", "Y.1992.a3", "Y.1992.a4", "Y.1992.a5", "Y.1993.a1", 
 "Y.1993.a2", "Y.1993.a3", "Y.1993.a4", "Y.1993.a5", "Y.1994.a1", 
 "Y.1994.a2", "Y.1994.a3", "Y.1994.a4", "Y.1994.a5"))

数据

{{1}}

答案 1 :(得分:1)

尝试aggregate其中DF是数据框:

aggregate(DF, list(Year = gsub("^Y.|.[^.]*$", "", rownames(DF))), mean)