Question

我有一个数组，我想在其中获得每列中值之间相似性的度量。我的意思是我希望比较数组的成对列之间的行，并在它们的值匹配时递增一个度量。结果测量结果将是两列完全相同的最大值。

基本上我的问题与此处讨论的相同：R: Compare all the columns pairwise in matrix，除了我不希望计算空单元格。

使用从链接页面派生的代码创建示例数据：

data1 <- c("", "B", "", "", "")
data2 <- c("A", "", "", "", "")
data3 <- c("", "", "C", "", "A")
data4 <- c("", "", "", "", "")
data5 <- c("", "", "C", "", "A")
data6 <- c("", "B", "C", "", "")

my.matrix <- cbind(data1, data2, data3, data4, data5, data6)

similarity.matrix <- matrix(nrow=ncol(my.matrix), ncol=ncol(my.matrix))
for(col in 1:ncol(my.matrix)){
  matches <- my.matrix[,col] == my.matrix
  match.counts <- colSums(matches)
  match.counts[col] <- 0 
  similarity.matrix[,col] <- match.counts

}

我获得：

similarity.matrix =

    V1  V2  V3  V4  V5  V6
1   0   3   2   4   2   4
2   3   0   2   4   2   2
3   2   2   0   3   5   3
4   4   4   3   0   3   3
5   2   2   5   3   0   3
6   4   2   3   3   3   0

计算非值对。

我想要的输出是：

expected.output =

    V1  V2  V3  V4  V5  V6
1   0   0   0   0   0   1
2   0   0   0   0   0   0
3   0   0   0   0   2   1
4   0   0   0   0   0   0
5   0   0   2   0   0   1
6   1   0   1   0   1   0

谢谢，

马特

Answer 1

以下是akrun的回答：

首先将空白单元格更改为NA＆＃39>

is.na(my.matrix) <- my.matrix==''

然后删除match.counts

的NA

similarity.matrix <- matrix(nrow=ncol(my.matrix), ncol=ncol(my.matrix))

for(col in 1:ncol(my.matrix)){
  matches <- my.matrix[,col] == my.matrix
  match.counts <- colSums(matches, na.rm=TRUE)
  match.counts[col] <- 0 
  similarity.matrix[,col] <- match.counts

}

确实给了我想要的输出：

    V1  V2  V3  V4  V5  V6
1   0   0   0   0   0   1
2   0   0   0   0   0   0
3   0   0   0   0   2   1
4   0   0   0   0   0   0
5   0   0   2   0   0   1
6   1   0   1   0   1   0

谢谢。

R忽略空值的矩阵列的成对比较

1 个答案: