我正在尝试根据列的值来计算中位数或平均值。
想象一下以下DF
DF <- data.frame("name", 1:20, 3, 2:5, 0, 8:12)
colnames(DF)<- c("name","sample_1","sample_2", "sample_3", "median_mean", "Frequence")
我想在“median_mean”列中填入每行3个样本的中位数或平均值,具体取决于频率列。如果Freq大于或等于10,则使用中位数,否则,使用mean。
请记住,样本不会总是3,所以我不能使用列(2:4)。但是他们的名字永远都是sample_X。
任何人都可以帮我一把忙吗?
答案 0 :(得分:3)
DF <- data.frame("name", 1:20, 3, 2:5, 0, 8:12)
colnames(DF)<- c("name","sample_1","sample_2", "sample_3", "median_mean", "Frequence")
DF$median_mean = ifelse(DF$Frequence>=10, apply(DF[grep("sample_", names(DF))], 1L, median), apply(DF[grep("sample_", names(DF))], 1L, mean))
我们使用以下内容将median
和mean
应用于相关列:
apply(DF[grep("sample_", names(DF))], 1L, median)
和
apply(DF[grep("sample_", names(DF))], 1L, mean)
但我们只使用三元运算符ifelse
的矢量化形式返回我们想要的值。
该代码也适用于名为sample_X
的任意数量的列,因为我们只是通过grep("sample_", names(DF))
搜索其名称来概括列的选择。
答案 1 :(得分:1)
循环遍历行,根据列 Frequence 获取匹配函数( match.fun ):
# sample_ column index
ix <- grepl("sample_", colnames(DF), fixed = TRUE)
DF$median_mean <- apply(DF, 1, function(i){
myFun <- match.fun(ifelse(i[6] >= 10, "median", "mean"))
myFun(as.numeric(i[ix]))
})
答案 2 :(得分:0)
这样可行,使用grep获取cols编号
for(i in 1:nrow(DF)){
cols <- grep("sample", names(DF))
if(DF[i,]$Frequence > 10){
DF$median_mean[i] <- mean(as.integer(DF[i,cols]))
}else{
DF$median_mean[i] <- median(as.integer(DF[i,cols]))
}
}
答案 3 :(得分:0)
DF <- data.frame("name", 1:20, 3, 2:5, 0, 8:12)
colnames(DF)<- c("name","sample_1","sample_2", "sample_3", "median_mean", "Frequence")
DF[DF$Frequence>10,]$median_mean<-apply(DF[DF$Frequence>10,grep("sample_",names(DF))],1,median)
DF[DF$Frequence<10,]$median_mean<-rowMeans(DF[DF$Frequence<10,grep("sample_",names(DF))])