I have a matrix with non numeric-values (missing values are blank, not Nan).
mat = read.table(textConnection(
" s1 s2 s3
g1 a;b a b
g2 b
g3 a a;b"), row.names = 1, header = TRUE, sep = "\t", stringsAsFactors = FALSE)
mat = as.matrix(mat)
What I want to do is to subset the matrix to select the rows with the two highest values in a row.
So the result should be
g1 a;b a b # with three values
g3 a a;b # with two values
# g2 should be excluded because it only has one value
My approach would be
But I do not understand how to sort a matrix by the amount of entries.
Any ideas?
答案 0 :(得分:3)
您可以尝试使用apply
行的内容,并检查行中有多少元素是空字符串,然后按计数排序。所以排序的矩阵就像:
mat[order(apply(mat, 1, function(row) sum(row != "")), decreasing = T), ]
s1 s2 s3
g1 "a;b" "a" "b"
g3 "a" "" "a;b"
g2 "" "" "b"
如果阈值为2,您也可以直接在函数中指定它而不进行排序:
mat[apply(mat, 1, function(row) sum(row != "") >= 2), ]
s1 s2 s3
g1 "a;b" "a" "b"
g3 "a" "" "a;b"
@alexis_laz建议的另一种方法是使用rowSums
:
mat[rowSums(mat != "") >= 2, ]
s1 s2 s3
g1 "a;b" "a" "b"
g3 "a" "" "a;b"