我有一个字符/数字组合矩阵,我需要删除列中两行中出现相同字符的列。举一个简单的例子:
> chars <- c("A1","A2","B1","B2")
> charsmat <- combn(chars, 2)
> charsmat
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "A1" "A1" "A1" "A2" "A2" "B1"
[2,] "A2" "B1" "B2" "B1" "B2" "B2"
当单个列的两行包含相同的字符时(在本例中为第1列和第6列),我需要删除该列。我觉得自己有这些部分:使用gsub()
或str_extract()
来隔离字符,并测试行之间是否匹配,但我对如何制定它感到困惑。提前感谢您提供的任何帮助。
答案 0 :(得分:3)
首先,创建一个仅提取字母部分的新矩阵:
> (charsmat.alpha <- substr(charsmat, 0, 1))
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "A" "A" "A" "A" "A" "B"
[2,] "A" "B" "B" "B" "B" "B"
然后,从charsmat
获取两列charsmat.alpha
不同的列的子集:
> charsmat[,(charsmat.alpha[1,] != charsmat.alpha[2,])]
[,1] [,2] [,3] [,4]
[1,] "A1" "A1" "A2" "A2"
[2,] "B1" "B2" "B1" "B2"
答案 1 :(得分:1)
这是一个更通用的解决方案,它将删除第1行条目中任何字母与第2行条目中任何字母匹配的列:
## Your data
chars <- c("A1","A2","B1","B2")
charsmat <- combn(chars, 2)
vetMatrix <- function(mat) {
## Remove non-alpha characters from matrix entries
mm <- gsub("[^[:alpha:]]", "", mat)
## Construct character class regex patterns from first row
patterns <- paste0("[", mm[1,], "]")
xs <- mm[2,]
## Extract columns in which no character in first row is found in second
mat[,!mapply("grepl", patterns, xs), drop=FALSE]
}
## Try it with your matrix ...
vetMatrix(charsmat)
# [,1] [,2] [,3] [,4]
# [1,] "A1" "A1" "A2" "A2"
# [2,] "B1" "B2" "B1" "B2"
## ... and with a different matrix
mat <- matrix(c("AB1", "B1", "AA11", "BB22", "this", "that"), ncol=3)
mat
# [,1] [,2] [,3]
# [1,] "AB1" "AA11" "this"
# [2,] "B1" "BB22" "that"
vetMatrix(mat)
# [,1]
# [1,] "AA11"
# [2,] "BB22"