如何通过数据框或矩阵中的不同行进行子集化?

时间:2015-06-18 23:23:12

标签: r matrix filter dataframe subset

假设我有以下矩阵:

matrix(c(1,1,2,1,2,3,2,1,3,2,2,1),ncol=3)

结果:

     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    1    3    2
[3,]    2    2    2
[4,]    1    1    1

如何通过每行是否具有重复值来过滤/子化此矩阵?例如,在这种情况下,我只想保留第1行和第2行。

任何想法都会非常感激!

3 个答案:

答案 0 :(得分:4)

试试这个:(我怀疑会比任何apply方法更快)

 mat[ rowSums(mat == mat[,1])!=ncol(mat) , ]
# ---with your object---
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    1    3    2

答案 1 :(得分:2)

#List the items and send to a.txt
ls $home\Devoluciones\Dto*.dat | select -exp Name > $home\a.txt
#From a.txt keep first 6 characters and send to b.txt
Get-Content $home\a.txt | foreach {$_.remove(6)} | Add-Content $home\b.txt
#From b.txt replace Dto with "" and send to c.txt
Get-Content $home\b.txt | foreach {$_ -replace "Dto",""} | Add-Content $home\c.txt
#From c.txt copy the files to destination
Get-Content $home\c.txt | foreach {copy-item $home\Devoluciones\*$_*.dat $Destination\$_\}
#Clean temp files
Remove-Item -ErrorAction Ignore $home\a.txt -Force
Remove-Item -ErrorAction Ignore $home\b.txt -Force
Remove-Item -ErrorAction Ignore $home\c.txt -Force

这第二个只是为了好玩。您可以按照逻辑来了解它的工作原理。

indx <- apply(m, 1, function(x) !any(duplicated(x)))
m[indx, ]
#     [,1] [,2] [,3]
#[1,]    1    2    3
#[2,]    1    3    2

答案 2 :(得分:2)

使用anyDuplicated函数,我的方法稍微缩短一点,这应该更快。

mat[!apply(mat, 1, anyDuplicated), ]
[,1] [,2] [,3]
[1,]    1    2    3
[2,]    1    3    2