R中3个矩阵的元素中位数

时间:2016-07-13 14:09:59

标签: r matrix median

我有3个矩阵,在每个矩阵中存储三次重复测量(矩阵1,测量1,矩阵2测量2,......)

他们有以下结构:

> a1
            ACTIN       18S      TET1      TET2      TET3
Control 25.943441  22.62984      <NA> 34.063107 34.034756
Sample1  24.48504  20.04858      <NA>  32.37173 32.341072
Sample2 25.265867 19.680647 28.086248  33.76187  33.41289
Sample3 24.441484 18.146513      <NA> 32.811428  31.22825
> a2
            ACTIN       18S      TET1      TET2      TET3
Control 25.980696 22.393877      <NA> 34.548923   33.7815
Sample1 24.263775 20.073978  27.23082  32.27775 32.343292
Sample2  25.25487 19.680494 27.214449  33.70534  33.48968
Sample3  24.26332 18.108198      <NA> 32.769787  31.19895
> a3
            ACTIN       18S      TET1      TET2      TET3
Control 25.937397 22.429556 30.020935  33.98415 33.858604
Sample1  24.44776 20.090088 28.328804 32.317287 32.291912
Sample2 25.148333 19.537455      <NA>  33.83607   33.3961
Sample3 24.242998 18.335524      <NA> 32.788536 31.147346

我想用3次测量的中位数创建一个新矩阵。 理想情况下,第一列保持不变。 如果没有值(未确定),则给予NA

我想有一个带有中位数的矩阵,所以像这样:

median(a1[i,j], a2[i,j], a2[i,j])

我尝试了以下内容: 2 for循环遍历数组:

med<-matrix(NA, nrow(a1), ncol(a1))    
for(i in ncol(a1)){
      for(j in nrow(a1)){
        med[i,j]<-median(a1[i,j], a2[i,j], a2[i,j])
      }
    }

但这给了我显然不是中位数的价值,我觉得它过于复杂。

谢谢!

3 个答案:

答案 0 :(得分:2)

你可以先取代&#34;未确定&#34; by&#34; NA&#34;并且您将自动获得NA。我不想输入所有这些数字,所以我只使用了1到5但它适用于任何数字。

a1 <- data.frame(c("Control", "Sample1", "Sample2", "Sample3"), 1, 2, c("Undetermined", "Undetermined", 3, "Undetermined"), 4, 5) 
a2 <- data.frame(c("Control", "Sample1", "Sample2", "Sample3"), 1, 2, c("Undetermined", 3, 3, "Undetermined"), 4, 5) 
a3 <- data.frame(c("Control", "Sample1", "Sample2", "Sample3"), 1, 2, c(3, 3, "Undetermined", "Undetermined"), 4, 5) 
names(a1) <- names(a2) <- names(a3) <- c("Sample", "CT ACTIN", "CT 18S", "CT TET1", "CT TET2", "CT TET3")
a1[a1 == "Undetermined"] <- NA
a2[a2 == "Undetermined"] <- NA
a3[a3 == "Undetermined"] <- NA

med <- matrix(NA, nrow = nrow(a1), ncol = ncol(a1))
for (i in 1:nrow(a1)) {
  for (j in 1:ncol(a1)){
  med[i, j] <- median(c(a1[i, j], a2[i, j], a3[i, j]))
  }
}

med <- data.frame(a1[, 1], med)
names(med) <- c("Sample", "CT ACTIN", "CT 18S", "CT TET1", "CT TET2", "CT TET3")

答案 1 :(得分:1)

您可以使用mapply并重新生成结果矩阵。假设您的数据最初是我从<NA>推断的字符矩阵,那么可重现的解决方案就像:

dat <- mapply(function(...) median(as.numeric(c(...))), a1, a2, a3)
# this gives a warning message but you can ignore this which comes up when it converts the character `NA` to numeric `NA`;
matrix(dat, nrow(a1), ncol(a1), dimnames = dimnames(a1))

#            ACTIN     X18S TET1     TET2     TET3
# Control 25.94344 22.42956   NA 34.06311 33.85860
# Sample1 24.44776 20.07398   NA 32.31729 32.34107
# Sample2 25.25487 19.68049   NA 33.76187 33.41289
# Sample3 24.26332 18.14651   NA 32.78854 31.19895

数据

a1 <- structure(c("25.94344", "24.48504", "25.26587", "24.44148", "22.62984", 
"20.04858", "19.68065", "18.14651", "<NA>", "<NA>", "28.086248", 
"<NA>", "34.06311", "32.37173", "33.76187", "32.81143", "34.03476", 
"32.34107", "33.41289", "31.22825"), .Dim = 4:5, .Dimnames = list(
    c("Control", "Sample1", "Sample2", "Sample3"), c("ACTIN", 
    "X18S", "TET1", "TET2", "TET3")))

a2 <- structure(c("25.98070", "24.26377", "25.25487", "24.26332", "22.39388", 
"20.07398", "19.68049", "18.10820", "<NA>", "27.23082", "27.214449", 
"<NA>", "34.54892", "32.27775", "33.70534", "32.76979", "33.78150", 
"32.34329", "33.48968", "31.19895"), .Dim = 4:5, .Dimnames = list(
    c("Control", "Sample1", "Sample2", "Sample3"), c("ACTIN", 
    "X18S", "TET1", "TET2", "TET3")))

a3 <- structure(c("25.93740", "24.44776", "25.14833", "24.24300", "22.42956", 
"20.09009", "19.53746", "18.33552", "30.020935", "28.328804", 
"<NA>", "<NA>", "33.98415", "32.31729", "33.83607", "32.78854", 
"33.85860", "32.29191", "33.39610", "31.14735"), .Dim = 4:5, .Dimnames = list(
    c("Control", "Sample1", "Sample2", "Sample3"), c("ACTIN", 
    "X18S", "TET1", "TET2", "TET3")))

答案 2 :(得分:0)

假设您的数据集采用您在编辑之前发布的形式:

> a1
#    Sample CT ACTIN   CT 18S      CT TET1  CT TET2  CT TET3
#1: Control 25.94344 22.62984 Undetermined 34.06311 34.03476
#2: Sample1 24.48504 20.04858 Undetermined 32.37173 32.34107
#3: Sample2 25.26587 19.68065    28.086248 33.76187 33.41289
#4: Sample3 24.44148 18.14651 Undetermined 32.81143 31.22825

您可以使用mget()检索环境中与a[[:digit:]]匹配的对象,并bind_rows()将它们放在一起:

library(dplyr)
dat <- bind_rows(mget(ls(pattern = "a[[:digit:]]")))

然后使用na_if()"Undetermined"替换为NA,将除Sample之外的所有列转换为数字,并使用median()计算summarise_each()

dat %>%
  na_if("Undetermined") %>%
  mutate_each(funs(as.numeric), -Sample) %>%
  group_by(Sample) %>%
  summarise_each(funs(median(., na.rm = TRUE)), -Sample)

给出了:

# A tibble: 4 x 6
#   Sample CT ACTIN   CT 18S  CT TET1  CT TET2  CT TET3
#    <chr>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
#1 Control 25.94344 22.42956 30.02094 34.06311 33.85860
#2 Sample1 24.44776 20.07398 27.77981 32.31729 32.34107
#3 Sample2 25.25487 19.68049 27.65035 33.76187 33.41289
#4 Sample3 24.26332 18.14651       NA 32.78854 31.19895