Question

这是我的小数据集，这是一个函数：

dat <- data.frame (
 A1 = c("AA", "AA", "AA", "AA"),
 B1 = c("BB", "BB", "AB", "AB"), 
 C1 = c("AB", "BB", "AA", "AB"))

功能

syfun <- function (x, y){

if(x == "AA" & y == "AA" | x == "BB" & y == "BB"){
        sxy = 1
}
if(x == "AA" & y == "AB" | x == "AB" & y == "AA"){
    sxy = 0.5
}
if (x == "AA" & y == "BB"| x == "BB" & y == "AA"){
    sxy = 0
}
return(sxy)
}

out <- rep (NA, NROW(dat))

for (i in 1:NROW(dat)){
out[i] <- syfun (dat[i,1], dat[i,1])
}

mean(out)
1

这里我要做的是将第一列（变量A）的函数应用于相同的变量（变量A1）并平均输出值。我想将此输出保存到矩阵的单元格。

同样在A1和B1之间。

   for (i in 1:NROW(dat)){
    out[i] <- syfun (dat[i,1], dat[i,2])
    }
    mean(out)
    0.25

现在类似于相关矩阵，我想保存变量之间所有可能的组合来制作矩阵。

         A1    B1    C1
A1       1.0  0.25  0.5
B1       0.25  1.0  NA
C1       0.5   NA   1.0

编辑：更完整的功能，不产生NA

syfun <- function (x, y){
  sxy <- NA
  if(x == "AA" & y == "AA" | x == "BB" & y == "BB"){
        sxy = 1
  }
  if(x == "AA" & y == "AB" | x == "AB" & y == "AA"){
        sxy = 0.5
  }
  if (x == "AA" & y == "BB"| x == "BB" & y == "AA"){
        sxy = 0
  }
  if (x == "BB" & y == "AB"| x == "AB" & y == "BB"){
        sxy = 0.5
  }

  if(x == "AB" & y ==  "AB") {
    sxy = 0.5
    }
  return(sxy)
}

Answer 1

首先，如果没有匹配，您的函数syfun必须返回NA。因此，我在函数的顶部添加了一行：

syfun <- function (x, y){
  sxy <- NA
  if(x == "AA" & y == "AA" | x == "BB" & y == "AA"){
        sxy = 1
  }
  if(x == "AA" & y == "AB" | x == "AB" & y == "AA"){
        sxy = 0.5
  }
  if (x == "AA" & y == "BB"| x == "BB" & y == "AA"){
        sxy = 0
  }
  return(sxy)
}

其次，您可以使用outer将该功能应用于所有组合。您需要使用Vectorize来对函数进行矢量化：

mat <- outer(names(dat), names(dat), function(x, y) 
  Vectorize(function(a, b) mean(Vectorize(syfun)(dat[[a]], dat[[b]])))(x,y))

第三，用1替换对角线上的元素：

diag(mat) <- 1

第四，设置行名和列名：

dimnames(mat) <- list(names(dat), names(dat))

结果：

     A1   B1  C1
A1 1.00 0.25 0.5
B1 0.25 1.00  NA
C1 0.50   NA 1.0

Answer 2

从您的示例中看起来您只想知道As中的As与一个中的As的比例来计算它们的相似性。如果是这种情况那么:(我假设这些是基因？）

dat <- data.frame (
 A1 = c("AA", "AA", "AA", "AA"),
 B1 = c("BB", "BB", "AB", "AB"), 
 C1 = c("AB", "BB", "AA", "AB"))

## this function takes the columns from dat,  pastes all the genes together, then counts the number of each that appears. It then divides the smaller by the larger to give you a percent similar (only does it for 'A' right now, but I could expand that to more genes if necessary)

fun <-  function(x,y){
  x.prop <- table(unlist(strsplit(Reduce(paste0, x),'*')))
  y.prop <- table(unlist(strsplit(Reduce(paste0, y),'*')))
  ans <- ifelse(x.prop['A']>y.prop['A'], y.prop['A']/x.prop['A'], x.prop['A']/y.prop['A'])
  return(ans)
}

final_mat <- matrix(ncol=3,nrow=3) ## creates an empty final matrix
colnames(final_mat) <- colnames(dat)  
rownames(final_mat) <- colnames(dat)


### this applies 'fun' to each of the 2 combinations of column names
final_mat[upper.tri(final_mat)] <- apply(combn(colnames(dat),2),2,function(x) fun(dat[,x[1]], dat[,x[2]]))

final_mat[lower.tri(final_mat)] <- apply(combn(colnames(dat),2),2,function(x) fun(dat[,x[1]], dat[,x[2]]))

diag(final_mat) <- 1

final_mat
     A1   B1  C1
A1 1.00 0.25 0.5
B1 0.25 1.00 0.5
C1 0.50 0.50 1.0

通过将函数应用于r中所有可能的变量组合来生成矩阵

2 个答案: