行包含相同字符时删除列

时间:2012-07-17 19:07:14

标签: r

我有一个字符/数字组合矩阵,我需要删除列中两行中出现相同字符的列。举一个简单的例子:

> chars <- c("A1","A2","B1","B2")
> charsmat <- combn(chars, 2)
> charsmat
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,] "A1" "A1" "A1" "A2" "A2" "B1"
[2,] "A2" "B1" "B2" "B1" "B2" "B2"

当单个列的两行包含相同的字符时(在本例中为第1列和第6列),我需要删除该列。我觉得自己有这些部分:使用gsub()str_extract()来隔离字符,并测试行之间是否匹配,但我对如何制定它感到困惑。提前感谢您提供的任何帮助。

2 个答案:

答案 0 :(得分:3)

首先,创建一个仅提取字母部分的新矩阵:

> (charsmat.alpha <- substr(charsmat, 0, 1))
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,] "A"  "A"  "A"  "A"  "A"  "B" 
[2,] "A"  "B"  "B"  "B"  "B"  "B"

然后,从charsmat获取两列charsmat.alpha不同的列的子集:

> charsmat[,(charsmat.alpha[1,] != charsmat.alpha[2,])]
     [,1] [,2] [,3] [,4]
[1,] "A1" "A1" "A2" "A2"
[2,] "B1" "B2" "B1" "B2"

答案 1 :(得分:1)

这是一个更通用的解决方案,它将删除第1行条目中任何字母与第2行条目中任何字母匹配的列:

## Your data
chars <- c("A1","A2","B1","B2")
charsmat <- combn(chars, 2)

vetMatrix <- function(mat) {
    ## Remove non-alpha characters from matrix entries
    mm <- gsub("[^[:alpha:]]", "", mat)    
    ## Construct character class regex patterns from first row
    patterns <- paste0("[", mm[1,], "]")
    xs <- mm[2,]    
    ## Extract columns in which no character in first row is found in second
    mat[,!mapply("grepl", patterns, xs), drop=FALSE]
}

## Try it with your matrix ...
vetMatrix(charsmat)
#      [,1] [,2] [,3] [,4]
# [1,] "A1" "A1" "A2" "A2"
# [2,] "B1" "B2" "B1" "B2"

## ... and with a different matrix
mat <- matrix(c("AB1", "B1", "AA11", "BB22", "this", "that"), ncol=3) 
mat
#      [,1]  [,2]   [,3]  
# [1,] "AB1" "AA11" "this"
# [2,] "B1"  "BB22" "that"
vetMatrix(mat)
#     [,1]  
# [1,] "AA11"
# [2,] "BB22"