根据列的值计算中值或平均值

时间:2018-01-29 16:23:07

标签: r mean median

我正在尝试根据列的值来计算中位数或平均值。

想象一下以下DF

DF <- data.frame("name", 1:20, 3, 2:5, 0, 8:12)
colnames(DF)<- c("name","sample_1","sample_2", "sample_3", "median_mean", "Frequence")

我想在“median_mean”列中填入每行3个样本的中位数或平均值,具体取决于频率列。如果Freq大于或等于10,则使用中位数,否则,使用mean。

请记住,样本不会总是3,所以我不能使用列(2:4)。但是他们的名字永远都是sample_X。

任何人都可以帮我一把忙吗?

4 个答案:

答案 0 :(得分:3)

DF <- data.frame("name", 1:20, 3, 2:5, 0, 8:12)
colnames(DF)<- c("name","sample_1","sample_2", "sample_3", "median_mean", "Frequence")

DF$median_mean = ifelse(DF$Frequence>=10, apply(DF[grep("sample_", names(DF))], 1L, median), apply(DF[grep("sample_", names(DF))], 1L, mean))

说明

我们使用以下内容将medianmean应用于相关列:

  • apply(DF[grep("sample_", names(DF))], 1L, median)

  • apply(DF[grep("sample_", names(DF))], 1L, mean)

但我们只使用三元运算符ifelse的矢量化形式返回我们想要的值。

该代码也适用于名为sample_X的任意数量的列,因为我们只是通过grep("sample_", names(DF))搜索其名称来概括列的选择。

答案 1 :(得分:1)

循环遍历行,根据列 Frequence 获取匹配函数( match.fun ):

# sample_ column index
ix <- grepl("sample_", colnames(DF), fixed = TRUE)

DF$median_mean <- apply(DF, 1, function(i){
  myFun <- match.fun(ifelse(i[6] >= 10, "median", "mean"))
  myFun(as.numeric(i[ix]))
})

答案 2 :(得分:0)

这样可行,使用grep获取cols编号

for(i in 1:nrow(DF)){

   cols <- grep("sample", names(DF))
   if(DF[i,]$Frequence > 10){
     DF$median_mean[i] <- mean(as.integer(DF[i,cols]))
   }else{
     DF$median_mean[i] <- median(as.integer(DF[i,cols]))
  } 
}

答案 3 :(得分:0)

DF <- data.frame("name", 1:20, 3, 2:5, 0, 8:12)
colnames(DF)<- c("name","sample_1","sample_2", "sample_3", "median_mean", "Frequence")

DF[DF$Frequence>10,]$median_mean<-apply(DF[DF$Frequence>10,grep("sample_",names(DF))],1,median)
DF[DF$Frequence<10,]$median_mean<-rowMeans(DF[DF$Frequence<10,grep("sample_",names(DF))])