我有一个大矩阵,约300行和200000列。我想通过选择至少有一个值为&gt;的整列来缩小它。 0.5或小于-0.5(不仅仅是特定值)。我想保留行名和列名。通过tmp<-mymat > 0.5 | mymat < -0.5
,我能够得到一个真假的矩阵。我想提取其中至少有一个TRUE
的所有列。我只是尝试mymat[tmp]
,但这只返回满足该条件的值的向量。如何获得原始矩阵的实际列?感谢。
答案 0 :(得分:6)
试试这个:
> set.seed(007) # for the example being reproducible
> X <- matrix(rnorm(100), 20) # generating some data
> X <- cbind(X, runif(20, max=.48)) # generating a column with all values < 0.5
> colnames(X) <- paste('col', 1:ncol(X), sep='') # some column names
> X # this is how the matrix looks like
col1 col2 col3 col4 col5 col6
[1,] 2.287247161 0.83975036 1.218550535 0.07637147 0.342585350 0.335107187
[2,] -1.196771682 0.70534183 -0.699317079 0.15915528 0.004248236 0.419502015
[3,] -0.694292510 1.30596472 -0.285432752 0.54367418 0.029219842 0.346358090
[4,] -0.412292951 -1.38799622 -1.311552673 0.70480735 -0.393423429 0.212185020
[5,] -0.970673341 1.27291686 -0.391012431 0.31896914 -0.792704563 0.224824248
[6,] -0.947279945 0.18419277 -0.401526613 1.10924979 -0.311701865 0.415837389
[7,] 0.748139340 0.75227990 1.350517581 0.76915419 -0.346068592 0.057660111
[8,] -0.116955226 0.59174505 0.591190027 1.15347367 -0.304607588 0.007812921
[9,] 0.152657626 -0.98305260 0.100525456 1.26068350 -1.785893487 0.298192099
[10,] 2.189978107 -0.27606396 0.931071996 0.70062351 0.587274672 0.216225091
[11,] 0.356986230 -0.87085102 -0.262742349 0.43262716 1.635794434 0.026097800
[12,] 2.716751783 0.71871055 -0.007668105 -0.92260172 -0.645423474 0.190567072
[13,] 2.281451926 0.11065288 0.367153007 -0.61558421 0.618992169 0.402829397
[14,] 0.324020540 -0.07846677 1.707162545 -0.86665969 0.236393598 0.248196976
[15,] 1.896067067 -0.42049046 0.723740263 -1.63951709 0.846500899 0.406511129
[16,] 0.467680511 -0.56212588 0.481036049 -1.32583924 -0.573645739 0.162457572
[17,] -0.893800723 0.99751344 -1.567868244 -0.88903673 1.117993204 0.383801555
[18,] -0.307328300 -1.10513006 0.318250283 -0.55760233 -1.540001132 0.347037954
[19,] -0.004822422 -0.14228783 0.165991451 -0.06240231 -0.438123899 0.262938992
[20,] 0.988164149 0.31499490 -0.899907630 2.42269298 -0.150672971 0.139233120
>
> # defining a index for selecting if the condition is met
> ind <- apply(X, 2, function(X) any(abs(X)>0.5))
> X[,ind] # since col6 only has values less than 0.5 it is not taken
col1 col2 col3 col4 col5
[1,] 2.287247161 0.83975036 1.218550535 0.07637147 0.342585350
[2,] -1.196771682 0.70534183 -0.699317079 0.15915528 0.004248236
[3,] -0.694292510 1.30596472 -0.285432752 0.54367418 0.029219842
[4,] -0.412292951 -1.38799622 -1.311552673 0.70480735 -0.393423429
[5,] -0.970673341 1.27291686 -0.391012431 0.31896914 -0.792704563
[6,] -0.947279945 0.18419277 -0.401526613 1.10924979 -0.311701865
[7,] 0.748139340 0.75227990 1.350517581 0.76915419 -0.346068592
[8,] -0.116955226 0.59174505 0.591190027 1.15347367 -0.304607588
[9,] 0.152657626 -0.98305260 0.100525456 1.26068350 -1.785893487
[10,] 2.189978107 -0.27606396 0.931071996 0.70062351 0.587274672
[11,] 0.356986230 -0.87085102 -0.262742349 0.43262716 1.635794434
[12,] 2.716751783 0.71871055 -0.007668105 -0.92260172 -0.645423474
[13,] 2.281451926 0.11065288 0.367153007 -0.61558421 0.618992169
[14,] 0.324020540 -0.07846677 1.707162545 -0.86665969 0.236393598
[15,] 1.896067067 -0.42049046 0.723740263 -1.63951709 0.846500899
[16,] 0.467680511 -0.56212588 0.481036049 -1.32583924 -0.573645739
[17,] -0.893800723 0.99751344 -1.567868244 -0.88903673 1.117993204
[18,] -0.307328300 -1.10513006 0.318250283 -0.55760233 -1.540001132
[19,] -0.004822422 -0.14228783 0.165991451 -0.06240231 -0.438123899
[20,] 0.988164149 0.31499490 -0.899907630 2.42269298 -0.150672971
# It could be done just in one step avoiding 'ind'
X[, apply(X, 2, function(X) any(abs(X)>0.5))]
答案 1 :(得分:1)
Jilber对过滤后只剩下一列的情况的答案的补充:
X[, apply(X, 2, function(X) any(abs(X)>0.5)), drop=FALSE]
如果没有drop = FLASE参数,剩余的列将被转换为向量,您将丢失列名信息。