我有一个带有一些重复元素的向量。我想列出所有具有相同值的索引对。
即。给定输入
x <- c(1,2,3,2,4,3,2)
我想生成列表
duplicate_x <- [[2,4,7],[3,6]]
如何在R
中解决这个问题答案 0 :(得分:3)
<input>
答案 1 :(得分:3)
你也可以这样做:
dupEle <- unique(x[duplicated(x)])
lapply(dupEle, function(ele) which(x == ele))
[[1]]
[1] 2 4 7
[[2]]
[1] 3 6
答案 2 :(得分:1)
我对建议的两种解决方案进行了速度测试。
find_dups = function(x) {
dups = duplicated(x) | duplicated(x, fromLast = T)
split(which(dups), x[dups])
}
find_dups2 = function(x) {
dupEle <- unique(x[duplicated(x)])
lapply(dupEle, function(ele) which(x == ele))
}
dups_small = c(1,2,3,2,4,3,2)
set.seed(1)
dups_large = sample(0:9, size = 50, replace = T)
结果:
> microbenchmark::microbenchmark(find_dups(dups_small),
+ find_dups2(dups_small))
Unit: microseconds
expr min lq mean median uq max neval cld
find_dups(dups_small) 53.833 55.589 69.43074 57.491 66.4140 288.183 100 b
find_dups2(dups_small) 13.166 15.215 28.82765 16.677 20.0415 523.119 100 a
> microbenchmark::microbenchmark(find_dups(dups_large),
+ find_dups2(dups_large))
Unit: microseconds
expr min lq mean median uq max neval cld
find_dups(dups_large) 50.615 52.079 59.88706 53.834 64.9515 149.212 100 b
find_dups2(dups_large) 25.747 28.965 37.14842 31.014 34.3780 289.354 100 a
因此,重复劳拉解决方案实际上要快得多,大约在2到3.5倍之间。