我有3个矩阵,在每个矩阵中存储三次重复测量(矩阵1,测量1,矩阵2测量2,......)
他们有以下结构:
> a1
ACTIN 18S TET1 TET2 TET3
Control 25.943441 22.62984 <NA> 34.063107 34.034756
Sample1 24.48504 20.04858 <NA> 32.37173 32.341072
Sample2 25.265867 19.680647 28.086248 33.76187 33.41289
Sample3 24.441484 18.146513 <NA> 32.811428 31.22825
> a2
ACTIN 18S TET1 TET2 TET3
Control 25.980696 22.393877 <NA> 34.548923 33.7815
Sample1 24.263775 20.073978 27.23082 32.27775 32.343292
Sample2 25.25487 19.680494 27.214449 33.70534 33.48968
Sample3 24.26332 18.108198 <NA> 32.769787 31.19895
> a3
ACTIN 18S TET1 TET2 TET3
Control 25.937397 22.429556 30.020935 33.98415 33.858604
Sample1 24.44776 20.090088 28.328804 32.317287 32.291912
Sample2 25.148333 19.537455 <NA> 33.83607 33.3961
Sample3 24.242998 18.335524 <NA> 32.788536 31.147346
我想用3次测量的中位数创建一个新矩阵。
理想情况下,第一列保持不变。
如果没有值(未确定),则给予NA
我想有一个带有中位数的矩阵,所以像这样:
median(a1[i,j], a2[i,j], a2[i,j])
我尝试了以下内容: 2 for循环遍历数组:
med<-matrix(NA, nrow(a1), ncol(a1))
for(i in ncol(a1)){
for(j in nrow(a1)){
med[i,j]<-median(a1[i,j], a2[i,j], a2[i,j])
}
}
但这给了我显然不是中位数的价值,我觉得它过于复杂。
谢谢!
答案 0 :(得分:2)
你可以先取代&#34;未确定&#34; by&#34; NA&#34;并且您将自动获得NA
。我不想输入所有这些数字,所以我只使用了1到5但它适用于任何数字。
a1 <- data.frame(c("Control", "Sample1", "Sample2", "Sample3"), 1, 2, c("Undetermined", "Undetermined", 3, "Undetermined"), 4, 5)
a2 <- data.frame(c("Control", "Sample1", "Sample2", "Sample3"), 1, 2, c("Undetermined", 3, 3, "Undetermined"), 4, 5)
a3 <- data.frame(c("Control", "Sample1", "Sample2", "Sample3"), 1, 2, c(3, 3, "Undetermined", "Undetermined"), 4, 5)
names(a1) <- names(a2) <- names(a3) <- c("Sample", "CT ACTIN", "CT 18S", "CT TET1", "CT TET2", "CT TET3")
a1[a1 == "Undetermined"] <- NA
a2[a2 == "Undetermined"] <- NA
a3[a3 == "Undetermined"] <- NA
med <- matrix(NA, nrow = nrow(a1), ncol = ncol(a1))
for (i in 1:nrow(a1)) {
for (j in 1:ncol(a1)){
med[i, j] <- median(c(a1[i, j], a2[i, j], a3[i, j]))
}
}
med <- data.frame(a1[, 1], med)
names(med) <- c("Sample", "CT ACTIN", "CT 18S", "CT TET1", "CT TET2", "CT TET3")
答案 1 :(得分:1)
您可以使用mapply
并重新生成结果矩阵。假设您的数据最初是我从<NA>
推断的字符矩阵,那么可重现的解决方案就像:
dat <- mapply(function(...) median(as.numeric(c(...))), a1, a2, a3)
# this gives a warning message but you can ignore this which comes up when it converts the character `NA` to numeric `NA`;
matrix(dat, nrow(a1), ncol(a1), dimnames = dimnames(a1))
# ACTIN X18S TET1 TET2 TET3
# Control 25.94344 22.42956 NA 34.06311 33.85860
# Sample1 24.44776 20.07398 NA 32.31729 32.34107
# Sample2 25.25487 19.68049 NA 33.76187 33.41289
# Sample3 24.26332 18.14651 NA 32.78854 31.19895
数据:
a1 <- structure(c("25.94344", "24.48504", "25.26587", "24.44148", "22.62984",
"20.04858", "19.68065", "18.14651", "<NA>", "<NA>", "28.086248",
"<NA>", "34.06311", "32.37173", "33.76187", "32.81143", "34.03476",
"32.34107", "33.41289", "31.22825"), .Dim = 4:5, .Dimnames = list(
c("Control", "Sample1", "Sample2", "Sample3"), c("ACTIN",
"X18S", "TET1", "TET2", "TET3")))
a2 <- structure(c("25.98070", "24.26377", "25.25487", "24.26332", "22.39388",
"20.07398", "19.68049", "18.10820", "<NA>", "27.23082", "27.214449",
"<NA>", "34.54892", "32.27775", "33.70534", "32.76979", "33.78150",
"32.34329", "33.48968", "31.19895"), .Dim = 4:5, .Dimnames = list(
c("Control", "Sample1", "Sample2", "Sample3"), c("ACTIN",
"X18S", "TET1", "TET2", "TET3")))
a3 <- structure(c("25.93740", "24.44776", "25.14833", "24.24300", "22.42956",
"20.09009", "19.53746", "18.33552", "30.020935", "28.328804",
"<NA>", "<NA>", "33.98415", "32.31729", "33.83607", "32.78854",
"33.85860", "32.29191", "33.39610", "31.14735"), .Dim = 4:5, .Dimnames = list(
c("Control", "Sample1", "Sample2", "Sample3"), c("ACTIN",
"X18S", "TET1", "TET2", "TET3")))
答案 2 :(得分:0)
假设您的数据集采用您在编辑之前发布的形式:
> a1
# Sample CT ACTIN CT 18S CT TET1 CT TET2 CT TET3
#1: Control 25.94344 22.62984 Undetermined 34.06311 34.03476
#2: Sample1 24.48504 20.04858 Undetermined 32.37173 32.34107
#3: Sample2 25.26587 19.68065 28.086248 33.76187 33.41289
#4: Sample3 24.44148 18.14651 Undetermined 32.81143 31.22825
您可以使用mget()
检索环境中与a[[:digit:]]
匹配的对象,并bind_rows()
将它们放在一起:
library(dplyr)
dat <- bind_rows(mget(ls(pattern = "a[[:digit:]]")))
然后使用na_if()
将"Undetermined"
替换为NA
,将除Sample
之外的所有列转换为数字,并使用median()
计算summarise_each()
dat %>%
na_if("Undetermined") %>%
mutate_each(funs(as.numeric), -Sample) %>%
group_by(Sample) %>%
summarise_each(funs(median(., na.rm = TRUE)), -Sample)
给出了:
# A tibble: 4 x 6
# Sample CT ACTIN CT 18S CT TET1 CT TET2 CT TET3
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 Control 25.94344 22.42956 30.02094 34.06311 33.85860
#2 Sample1 24.44776 20.07398 27.77981 32.31729 32.34107
#3 Sample2 25.25487 19.68049 27.65035 33.76187 33.41289
#4 Sample3 24.26332 18.14651 NA 32.78854 31.19895