我有这个矩阵
> Y
> [,1] [,2] [,3] [,4]
[1,] "0" "2" "9" "5"
[2,] "4" "7" "7" "3"
[3,] "1" "5" "7" "9"
[4,] "7" "8" "7" "4"
[5,] "7" "8" "7" "4"
[6,] "1" "1" "7" "2"
[7,] "7" "8" "7" "4"
...
我要从该矩阵中获取所有重复行,这些行重复1次,2次,3次,依此类推。
例如
“ 7”“ 8”“ 7”“ 4”
在Y中出现3次。如何找到所有其他情况?
因此输出应为:
返回在Y中出现两次的所有行。
返回在Y中出现3次的所有行。
返回在Y中出现4次或更多次的所有行。
我试图用
解决此问题> duplicate
命令,但这还不够。
答案 0 :(得分:2)
这是一个简单的解决方案,其基础是将矩阵的行连接成一个字符串,然后列表显示字符串出现的频率。
首先,我们将生成一些简单的伪数据。我生成随机的零和一,以确保将有大量重复项:
Y <- matrix(rbinom(100, 1, .5), ncol = 4)
head(Y)
#> [,1] [,2] [,3] [,4]
#> [1,] 0 0 0 1
#> [2,] 0 0 0 0
#> [3,] 0 0 0 0
#> [4,] 0 0 0 1
#> [5,] 0 1 1 0
#> [6,] 0 0 1 0
# I collapse all the values in each row into a string, so c(0,1,0,1) becomes "0101"
row.ids <- apply(Y, 1, paste, collapse = "")
# Now using table() I can get the frequency with which each pattern appears
row.freqs <- table(row.ids)
# All triply replicated rows
Y[row.ids %in% names(row.freqs[row.freqs==3]),]
#> [,1] [,2] [,3] [,4]
#> [1,] 0 0 0 1
#> [2,] 0 0 0 1
#> [3,] 0 1 1 0
#> [4,] 0 0 0 1
#> [5,] 0 1 1 0
#> [6,] 0 1 1 0
# All quadruply replicated rows
Y[row.ids %in% names(row.freqs[row.freqs==4]),]
#> [,1] [,2] [,3] [,4]
#> [1,] 0 0 0 0
#> [2,] 0 0 0 0
#> [3,] 0 0 1 0
#> [4,] 0 0 1 0
#> [5,] 0 0 0 0
#> [6,] 0 0 1 0
#> [7,] 0 1 1 1
#> [8,] 0 1 1 1
#> [9,] 0 1 1 1
#> [10,] 0 0 0 0
#> [11,] 0 1 1 1
#> [12,] 0 0 1 0
由reprex package(v0.2.1)于2019-02-20创建
答案 1 :(得分:1)
最后使用注释中的测试矩阵Y
,使用aggregate
创建一个数据帧ag
,其行是Y
的唯一行,然后是计算它们发生多少次。
ag <- aggregate(cbind(count = apply(Y, 1, toString)) ~ ., as.data.frame(Y),
FUN = length)
nc <- ncol(Y)
subset(ag, count == 2, select = -count) # shows rows which occur twice
split(ag[1:nc], ag$count) # splits unique rows into those that occur once, twice, etc.
Y <- matrix(c(0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1,
0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0,
0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0,
0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1), 25, 4)