Question

我有一个包含大约50个数字变量的数据框。我想创建一个新列，其中包含一定数量的这些变量的平均值，这些变量属于同一类别。例如，我可能想要创建一个名为df$mean_weight的新变量，其中包含受访者df$weight1，df$weight2，df$weight3行的平均值。和高度变量等一样。

这是我到目前为止所做的：

find_mean = function(...) {
  input_list = list(...)
  output_list = sapply(input_list,mean, na.rm=TRUE)
  return(output_list)
}

df$mean_weight = find_mean(df$weight1, df$weight2, df$weight3)

问题是这给了我一个错误，说替换的行数少于原始数据。出于某种原因，当我尝试使用相同代码的高度变量时，不会出现此错误。

Answer 1

我无法重现您的错误。该函数适用于我生成的样本数据集。

# Sample data
set.seed(2017);
df <- as.data.frame(matrix(runif(200), ncol = 5));
colnames(df) <- paste0("weight", seq(1:5));

# Your function
find_mean = function(...) {
  input_list = list(...)
  output_list = sapply(input_list,mean, na.rm=TRUE)
  return(output_list)
}

find_mean(df$weight1, df$weight2, df$weight3)
#[1] 0.4736851 0.5569710 0.4300163

您也可以在一行中获得相同的输出：

sapply(c("weight1", "weight2", "weight3"), function(x) mean(df[, x]))
#  weight1   weight2   weight3
#0.4736851 0.5569710 0.4300163

R：如何找到变量类别的平均值

1 个答案: