我有一个数字向量:
[1] 96.500 96.625 96.750 96.875 97.000 97.125 97.250 97.375 97.500 97.625 97.750 97.875 98.000
[14] 98.125 98.250 98.375 98.500 98.625 98.750 98.875 99.000 99.125 99.250 99.375 99.500 99.625
[27] 99.750 99.875 100.000 100.125 100.250 100.375 100.500
我想采用不同的数字99.49
,并找到它所在的向量中的值的索引号。在这种情况下,我希望它返回c(24, 25)
,因为感兴趣的数字介于99.375和99.5之间。
任何人都知道在R中执行此操作的简单方法(一行或两行代码)?假设感兴趣的数量可以在向量中。我目前有一个“while”循环,但试图看看是否有更简单的矢量化格式。
答案 0 :(得分:2)
x是你的向量,v是这个函数的给定数字
between <- function(x, v) {
c(max(which(x <= v)), min(which(x >= v)))
}
答案 1 :(得分:1)
以下是处理数字数据的match
的高效版本。高效,因为我的C ++实现是短路的,并在找到第一个匹配后完成搜索。也许我忽略了一些东西,但我真的认为基础R中缺少这样的功能,而我偶尔会偶然发现这个问题。
但请注意,根据问题,首先对目标向量进行排序(以及要匹配的向量)可能效率更高,而findInterval
正如评论中所建议的那样。
Rcpp::cppFunction('
IntegerVector match_dbl_cpp(NumericVector x, NumericVector table,
int nomatch, double tolerance) {
int n = x.size();
int m = table.size();
IntegerVector out(n, nomatch);
for (int i = 0; i < n; ++i) {
int j = 0;
while (j < m) {
if (std::abs(x[i] - table[j]) < tolerance) {
out[i] = j + 1;
break;
}
++j;
}
}
return out;
}
')
match_dbl <- function(x, table, nomatch = NA_integer_,
tolerance = sqrt(.Machine$double.eps)) {
if (!is.integer(nomatch))
stop("'nomatch' must be an integer'")
if (!is.numeric(tolerance) || tolerance <= 0.0)
stop("'tolerance' must be a positive number")
match_dbl_cpp(x, table, nomatch, tolerance)
}
# generate some random numeric data
set.seed(123)
table <- runif(1000L)
table <- sample(c(table, table)) # 'table' now contains duplicates
x <- sample(table, 100L)
m1 <- match(x, table)
m1_dbl <- match_dbl(x, table)
identical(m1, m1_dbl) # TRUE according to expectation
[1] TRUE
microbenchmark::microbenchmark(match(x, table),
match_dbl(x, table)) # speed is fine
Unit: microseconds
expr min lq mean median uq max neval
match(x, table) 45.622 48.6295 52.54944 49.5540 53.995 129.079 100
match_dbl(x, table) 46.380 48.9325 53.13952 49.6335 52.054 106.160 100
# minimally disturb x
x <- x + runif(n = length(x), min = -1e-10, max = 1e-10)
identical(m1, match(x, table)) # now FALSE
[1] FALSE
identical(m1_dbl, match_dbl(x, table)) # still TRUE
[1] TRUE
identical(m1_dbl, match_dbl(x, table, tolerance = 1e-11)) # also FALSE now
[1] FALSE
数字数据的%in%
版本可以轻松编写为:
`%in_dbl%` <- function(x, table) match_dbl(x, table, nomatch = 0L) > 0L
热烈欢迎有关改进的建议!
答案 2 :(得分:0)
z = scan(nmax = 33)
96.500 96.625 96.750 96.875 97.000 97.125 97.250 97.375 97.500 97.625 97.750 97.875 98.000
98.125 98.250 98.375 98.500 98.625 98.750 98.875 99.000 99.125 99.250 99.375 99.500 99.625
99.750 99.875 100.000 100.125 100.250 100.375 100.500 \n
btw <- function(data, num){
c(min(which(num<data))-1, min(which(num<data)))
}
btw(data = z, num = 99.49)