比较列并使用R查找列唯一的值

时间:2014-06-17 05:08:39

标签: r

我有一个20 * 10的矩阵。我想找出列中唯一的值。一个简单的例子是像:

这样的矩阵
matrix(c("a","b","c","d","s","a","d","l","s","a","m","n"),ncol=3,dimnames=list(NULL,c("a","b","c")))

看起来像:

     a   b   c  
[1,] "a" "s" "s"
[2,] "b" "a" "a"
[3,] "c" "d" "m"
[4,] "d" "l" "n"

使用unique并不能提供我想要的内容:

unique(c(mat)):
#[1] "a" "b" "c" "d" "s" "l" "m" "n"

期望的结果:

        a    b    c 
[1,] "NA" "NA" "NA" 
[2,] "b"  "NA" "NA" 
[3,] "c"  "NA"  "m" 
[4,] "NA"  "l"  "n"

1 个答案:

答案 0 :(得分:2)

新答案 - 我希望你现在得到了答案......其实你想找出非重复的项目......:)

set.seed(1)
mat = matrix(c("a","b","c","d","s","a","d","l","s","a","m","n"),
             ncol=3,dimnames=list(NULL,c("a","b","c")))
mat
     a   b   c  
[1,] "a" "s" "s"
[2,] "b" "a" "a"
[3,] "c" "d" "m"
[4,] "d" "l" "n"

现在你有两种方法。第一个包括找出唯一的价值......

notDuplicated = setdiff(c(mat),c(mat[duplicated(c(mat))]))
mat[!mat %in% notDuplicated] = NA 
mat
     a   b   c  
[1,] NA  NA  NA 
[2,] "b" NA  NA 
[3,] "c" NA  "m"
[4,] NA  "l" "n"

第二个你可以找到重复并直接消除它们

Duplicated = c(mat[duplicated(c(mat))])
mat[mat %in% Duplicated] = NA
     a   b   c  
[1,] NA  NA  NA 
[2,] "b" NA  NA 
[3,] "c" NA  "m"
[4,] NA  "l" "n"