创建由具有相同名称的多个列的平均值组成的唯一列

时间:2015-02-20 20:35:04

标签: r dataframe

我有一个400列的数据框,大约有100行。以下是数据框的头视图:

           MX      MX      ID      BR      MX      FR      BR      MX      ES      FR      ES      ES      MX      FR
2/19/2015  111.45  122.46   98.16  101.20   98.60  100.74   93.15   98.61  110.69  102.28  143.21  135.32  103.30   98.50
2/12/2015  110.71  123.50   98.60  100.97   98.00  100.67   93.18   99.84  110.57  102.33  141.50  136.04  102.63   99.25
2/5/2015   111.51  125.27   99.25  101.27   97.75  100.83   93.38  101.09  111.62  102.30  145.76  137.74  102.50   96.75
1/29/2015  111.00  122.25   99.63  101.25   99.20  100.63   93.06   98.69  111.59  102.47  142.75  138.61  101.88   96.25
1/22/2015  111.39  124.00   98.13  100.55   98.92  100.52   93.00  100.21  108.99  102.46  140.96  134.14  101.75   95.75
1/15/2015  111.11  121.37   97.38  100.35   99.75  100.66   93.00  101.11  109.50  102.48  143.03  131.35  101.50   95.45

我需要创建一个列,该列是具有该列名称的所有列的平均值,因此我有1列“MX”,即2015年2月19日所有MX的平均值,ID为相同,BR ,FR等。

3 个答案:

答案 0 :(得分:1)

如果您习惯使用$dollarsign表示法来引用数据框中的列,则这种格式可能令人生畏。但是,请记住,仍然可以使用其索引明确地引用每列。

对于您的具体情况,您可以使用names确定哪些列具有给定名称,并将这些索引传递给colMeans

df$MX.mean <- rowMeans(df[which(names(df) == "MX")])

如果您想对数据框中出现的每个名称执行此操作,请快速入侵。请注意,这可能不是最有效或最优雅的解决方案;在R中几乎总能避免循环。

for (name in unique(names(df))){
  mean.col <- rowMeans(df[which(names(df) == name)])
  df[paste(name, ".mean")] <- mean.col
}

最后,如果您可以首先避免此问题,并确保您的数据在逻辑上更具名称,那么您的生活可能会更轻松。

答案 1 :(得分:0)

您可以将mapplyrowMeans

结合使用
  nm1 <- unique(names(df1))
  res <-  mapply(function(x,y) rowMeans(df1[x == y], na.rm=TRUE),
                        list(names(df1)), nm1)
  colnames(res) <- paste0(nm1, '.Mean')
  res
  #         MX.Mean ID.Mean BR.Mean   FR.Mean  ES.Mean
  #2/19/2015 106.884   98.16  97.175 100.50667 129.7400
  #2/12/2015 106.936   98.60  97.075 100.75000 129.3700
  #2/5/2015  107.624   99.25  97.325  99.96000 131.7067
  #1/29/2015 106.604   99.63  97.155  99.78333 130.9833
  #1/22/2015 107.254   98.13  96.775  99.57667 128.0300
  #1/15/2015 106.968   97.38  96.675  99.53000 127.9600

cbind输出(&#34; res&#34;)与原始数据集。这样,&#34; df1&#34;中的列名称不会改变。

df1 <- cbind(df1, res)
head(df1,2)
#          MX     MX    ID     BR   MX     FR    BR    MX     ES     FR
#2/19/2015 111.45 122.46 98.16 101.20 98.6 100.74 93.15 98.61 110.69 102.28
#2/12/2015 110.71 123.50 98.60 100.97 98.0 100.67 93.18 99.84 110.57 102.33
#           ES     ES     MX    FR   MX.Mean ID.Mean BR.Mean  FR.Mean ES.Mean
#2/19/2015 143.21 135.32 103.30 98.50 106.884   98.16  97.175 100.5067  129.74
#2/12/2015 141.50 136.04 102.63 99.25 106.936   98.60  97.075 100.7500  129.37

或将wide格式转换为long格式,然后将其重新转换回wide

 library(reshape2)
 library(splitstackshape)
 res <- dcast.data.table(getanID(melt(as.matrix(df1)), 1:2)[,
      Var2:=paste0(Var2, '.Mean')], Var1~Var2, value.var='value', mean)

数据

 df1 <- structure(list(MX = c(111.45, 110.71, 111.51, 111, 111.39, 
 111.11
 ), MX = c(122.46, 123.5, 125.27, 122.25, 124, 121.37), ID = c(98.16, 
 98.6, 99.25, 99.63, 98.13, 97.38), BR = c(101.2, 100.97, 101.27, 
 101.25, 100.55, 100.35), MX = c(98.6, 98, 97.75, 99.2, 98.92, 
 99.75), FR = c(100.74, 100.67, 100.83, 100.63, 100.52, 100.66
 ), BR = c(93.15, 93.18, 93.38, 93.06, 93, 93), MX = c(98.61, 
 99.84, 101.09, 98.69, 100.21, 101.11), ES = c(110.69, 110.57, 
 111.62, 111.59, 108.99, 109.5), FR = c(102.28, 102.33, 102.3, 
 102.47, 102.46, 102.48), ES = c(143.21, 141.5, 145.76, 142.75, 
 140.96, 143.03), ES = c(135.32, 136.04, 137.74, 138.61, 134.14, 
 131.35), MX = c(103.3, 102.63, 102.5, 101.88, 101.75, 101.5), 
 FR = c(98.5, 99.25, 96.75, 96.25, 95.75, 95.45)), .Names = c("MX", 
"MX", "ID", "BR", "MX", "FR", "BR", "MX", "ES", "FR", "ES", "ES", 
"MX", "FR"), class = "data.frame", row.names = c("2/19/2015", 
"2/12/2015", "2/5/2015", "1/29/2015", "1/22/2015", "1/15/2015"))

答案 2 :(得分:0)

使用vapply()

的另一种选择
vapply(
    unique(names(df)), 
    function(x) rowMeans(df[grepl(x, names(df), fixed = TRUE)]),
    double(nrow(df))
)
#                MX    ID     BR        FR       ES
# 2/19/2015 106.884 98.16 97.175 100.50667 129.7400
# 2/12/2015 106.936 98.60 97.075 100.75000 129.3700
# 2/5/2015  107.624 99.25 97.325  99.96000 131.7067
# 1/29/2015 106.604 99.63 97.155  99.78333 130.9833
# 1/22/2015 107.254 98.13 96.775  99.57667 128.0300
# 1/15/2015 106.968 97.38 96.675  99.53000 127.9600

其中df

df <- read.table(check.names = FALSE, header = TRUE, text = "MX      MX      ID      BR      MX      FR      BR      MX      ES      FR      ES      ES      MX      FR
2/19/2015  111.45  122.46   98.16  101.20   98.60  100.74   93.15   98.61  110.69  102.28  143.21  135.32  103.30   98.50
2/12/2015  110.71  123.50   98.60  100.97   98.00  100.67   93.18   99.84  110.57  102.33  141.50  136.04  102.63   99.25
2/5/2015   111.51  125.27   99.25  101.27   97.75  100.83   93.38  101.09  111.62  102.30  145.76  137.74  102.50   96.75
1/29/2015  111.00  122.25   99.63  101.25   99.20  100.63   93.06   98.69  111.59  102.47  142.75  138.61  101.88   96.25
1/22/2015  111.39  124.00   98.13  100.55   98.92  100.52   93.00  100.21  108.99  102.46  140.96  134.14  101.75   95.75
1/15/2015  111.11  121.37   97.38  100.35   99.75  100.66   93.00  101.11  109.50  102.48  143.03  131.35  101.50   95.45")