Question

我有一个向量：a<-rep(sample(1:5,20, replace=T))

我确定每个值的出现频率：

tabulate(a)

我现在想确定最常出现的值的位置。

让我们说矢量是：

[1] 3 3 3 5 2 2 4 1 4 2 5 1 2 1 3 1 3 2 5 1

制表回报：

[1] 5 5 5 2 3

现在我确定列表max(tabulate(a))

返回的最高值

返回

[1] 5

有3个频率为5的值。我想知道这些值在表格输出中的位置。

即。我是表格的前三个条目。

Answer 1

使用table可能更容易：

x <- table(a)
x
# a
# 1 2 3 4 5 
# 5 5 5 2 3 
names(x)[x == max(x)]
# [1] "1" "2" "3"
which(a %in% names(x)[x == max(x)])
# [1]  1  2  3  5  6  8 10 12 13 14 15 16 17 18 20

或者，tabulate采用类似的方法：

x <- tabulate(a)
sort(unique(a))[x == max(x)]

以下是数字和字符向量的一些基准。使用数字数据时，性能差异更明显。

示例数据

set.seed(1)
a <- sample(20, 1000000, replace = TRUE)
b <- sample(letters, 1000000, replace = TRUE)

基准测试功能

t1 <- function() {
  x <- table(a)
  out1 <- names(x)[x == max(x)]
  out1
}

t2 <- function() {
  x <- tabulate(a)
  out2 <- sort(unique(a))[x == max(x)]
  out2
}

t3 <- function() {
  x <- table(b)
  out3 <- names(x)[x == max(x)]
  out3
}

t4 <- function() {
  x <- tabulate(factor(b))
  out4 <- sort(unique(b))[x == max(x)]
  out4
}

结果

library(rbenchmark)
benchmark(t1(), t2(), t3(), t4(), replications =  50)
#   test replications elapsed relative user.self sys.self user.child sys.child
# 1 t1()           50  30.548   24.244    30.416    0.064          0         0
# 2 t2()           50   1.260    1.000     1.240    0.016          0         0
# 3 t3()           50   8.919    7.079     8.740    0.160          0         0
# 4 t4()           50   5.680    4.508     5.564    0.100          0         0

确定向量中第i个元素的位置

1 个答案: