R: Sort matrix based on amount of row values

时间:2016-06-10 16:12:01

标签: r matrix

I have a matrix with non numeric-values (missing values are blank, not Nan).

mat = read.table(textConnection(
"   s1  s2  s3
g1  a;b  a  b
g2       b   
g3  a       a;b"), row.names = 1, header = TRUE, sep = "\t", stringsAsFactors = FALSE)
mat = as.matrix(mat)

What I want to do is to subset the matrix to select the rows with the two highest values in a row.

So the result should be

g1  a;b  a  b # with three values
g3  a       a;b # with two values
# g2 should be excluded because it only has one value

My approach would be

  • sort matrix by amount of values
  • subset sorted matrix

But I do not understand how to sort a matrix by the amount of entries.

Any ideas?

1 个答案:

答案 0 :(得分:3)

您可以尝试使用apply行的内容,并检查行中有多少元素是空字符串,然后按计数排序。所以排序的矩阵就像:

mat[order(apply(mat, 1, function(row) sum(row != "")), decreasing = T), ]
   s1    s2  s3   
g1 "a;b" "a" "b"  
g3 "a"   ""  "a;b"
g2 ""    ""  "b"  

如果阈值为2,您也可以直接在函数中指定它而不进行排序:

mat[apply(mat, 1, function(row) sum(row != "") >= 2), ]
   s1    s2  s3   
g1 "a;b" "a" "b"  
g3 "a"   ""  "a;b"

@alexis_laz建议的另一种方法是使用rowSums

mat[rowSums(mat != "") >= 2, ]
   s1    s2  s3   
g1 "a;b" "a" "b"  
g3 "a"   ""  "a;b"