如何测试向量是否包含R中的重复元素?
答案 0 :(得分:16)
我想我找到了答案。使用duplicated()函数:
a=c(3,5,7,2,7,9)
b=1:10
any(duplicated(a)) #True
any(duplicated(b)) #False
答案 1 :(得分:4)
同时尝试rle(x)
在x
中查找相同值的运行长度。
答案 2 :(得分:2)
如果您正在寻找连续重复,可以使用diff
。
a <- 1:10
b <- c(1:5, 5, 7, 8, 9, 10)
diff(a)
diff(b)
或向量中的任何位置:
length(a) == length(unique(a))
length(b) == length(unique(b))
答案 3 :(得分:0)
检查一下:
> all(diff(c(1,2,3)))
[1] TRUE
Warning message:
In all(diff(c(1, 2, 3))) : coercing argument of type 'double' to logical
> all(diff(c(1,2,2,3)))
[1] FALSE
Warning message:
In all(diff(sort(c(1, 2, 4, 2, 3)))) : coercing argument of type 'double' to logical
你可以添加一些演员来摆脱警告。
答案 4 :(得分:0)
如哈德利(Hadley)的评论部分所述:
anyDuplicated
对于很长的向量会更快一些-它在找到第一个重复项时会终止。
示例
a=c(3,5,7,2,7,9)
b=1:10
anyDuplicated(b) != 0L # TRUE
anyDuplicated(b) != 0L # FALSE
具有100万个观察值的基准:
set.seed(2011)
x <- sample(1e7, size = 1e6, replace = TRUE)
bench::mark(
ZNN = any(duplicated(x)),
RL = length(x) != length(unique(x)),
BUA = !all(diff(sort(x))),
AD = anyDuplicated(x) != 0L
)
# A tibble: 4 x 13
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
1 ZNN 64.62ms 70.04ms 11.5 11.8MB 0 8 0 693ms <lgl [1]> <df[,3] [2 x 3]> <bch:tm> <tibble [8 x 3]>
2 RL 66.95ms 70.67ms 12.5 15.4MB 0 7 0 561ms <lgl [1]> <df[,3] [3 x 3]> <bch:tm> <tibble [7 x 3]>
3 BUA 84.66ms 87.79ms 10.6 42MB 3.54 3 1 283ms <lgl [1]> <df[,3] [11 x 3]> <bch:tm> <tibble [4 x 3]>
4 AD 2.45ms 2.87ms 314. 8MB 5.98 105 2 335ms <lgl [1]> <df[,3] [1 x 3]> <bch:tm> <tibble [107 x 3]>
具有100个观察值的基准
set.seed(2011)
x <- sample(1e7, size = 100, replace = TRUE)
bench::mark(
ZNN = any(duplicated(x)),
RL = length(x) != length(unique(x)),
BUA = !all(diff(sort(x))),
AD = anyDuplicated(x) != 0L
)
# A tibble: 4 x 13
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
1 ZNN 7.14us 8.93us 60429. 1.48KB 6.04 9999 1 165.5ms <lgl [1]> <df[,3] [2 x 3]> <bch:tm> <tibble [10,000 x 3]>
2 RL 8.03us 9.37us 83754. 1.92KB 0 10000 0 119.4ms <lgl [1]> <df[,3] [3 x 3]> <bch:tm> <tibble [10,000 x 3]>
3 BUA 54.89us 61.58us 8317. 4.83KB 6.74 3701 3 445ms <lgl [1]> <df[,3] [11 x 3]> <bch:tm> <tibble [3,704 x 3]>
4 AD 5.8us 6.69us 123838. 1.05KB 0 10000 0 80.8ms <lgl [1]> <df[,3] [1 x 3]> <bch:tm> <tibble [10,000 x 3]>