我编写了一个函数,该函数使用sign()函数查找给定向量中哪些数字为正数或负数。我想知道是否有一种简单的方法可以在不使用sign()函数的情况下获取字符向量(例如+和-)。
答案 0 :(得分:1)
“围绕sign()
函数工作”的“硬性”是什么?
这里有几个选项,大多数看上去都很简单,但是您可以使用任何您喜欢的选项。
cut(x, breaks = c(-Inf, 0, Inf), labels = c("+", "-"))
factor(sign(x), levels = c(-1, 1), labels = c("-", "+"))
ifelse(x < 0, -1, 1)
ifelse(sign(x) == -1, "+", "-")
c("+", "-")[(x < 0) + 1L]
sub("1", "+", sub("-1", "-", sign(x))) # from comments
您可能要确保输入0的行为是您想要/期望的。
现在优化可能对此没有多大意义,因为很难想象这是一个代码瓶颈,即使是较慢的方法也可以很快完成,但是出于一般教育的目的,我们可以比较一下方法:
n = 1000
x = runif(n, min = -1, max = 1)
print(microbenchmark::microbenchmark(
cut = cut(x, breaks = c(-Inf, 0, Inf), labels = c("+", "-")),
factor = factor(sign(x), levels = c(-1, 1), labels = c("-", "+")),
ifelse_direct = ifelse(x < 0, -1, 1),
ifelse_sign = ifelse(sign(x) == -1, "+", "-"),
vector_index = c("+", "-")[(x < 0) + 1L],
double_sub = sub("1", "+", sub("-1", "-", sign(x))),
times = 10
), order = "mean")
# Unit: microseconds
# expr min lq mean median uq max neval cld
# vector_index 13.650 14.542 14.9753 15.1135 15.600 16.202 10 a
# ifelse_direct 62.070 64.065 83.4343 64.7030 68.473 170.470 10 a
# ifelse_sign 193.101 197.737 225.5119 203.9010 209.966 354.551 10 b
# cut 189.734 190.560 244.9517 207.7210 240.709 472.329 10 b
# factor 514.649 516.468 571.2281 541.8715 553.215 899.395 10 c
# double_sub 1295.653 1309.340 1376.3982 1381.7635 1420.775 1502.250 10 d
向量索引方法可能是可读性最差的方法,但是我将其包括在内是因为我猜想它将是最高效的,大约是原来的5倍。毫不奇怪,其余的似乎从简单变成了复杂。这是不完全公平的,因为输出是不同的类-如果我们将所有内容都强制为factor
,则ifelse_direct
方法会变慢,但是直接索引方法仍然最快,现在大约是7倍。 / p>
print(microbenchmark::microbenchmark(
cut = cut(x, breaks = c(-Inf, 0, Inf), labels = c("+", "-")),
factor = factor(sign(x), levels = c(-1, 1), labels = c("-", "+")),
ifelse_direct = factor(ifelse(x < 0, -1, 1), levels = c("-", "+")),
ifelse_sign = factor(ifelse(sign(x) == -1, "+", "-"), levels = c("-", "+")),
vector_index = factor(c("+", "-"), levels = c("-", "+"))[(x < 0) + 1L],
double_sub = factor(sub("1", "+", sub("-1", "-", sign(x))), levels = c("-", "+")),
times = 10
), order = "mean")
# Unit: microseconds
# expr min lq mean median uq max neval cld
# vector_index 22.968 24.742 29.5399 26.5030 33.719 41.736 10 a
# ifelse_sign 205.342 206.831 214.7748 211.4585 217.641 237.253 10 b
# cut 203.333 228.458 242.2857 234.2420 255.290 324.423 10 b
# factor 516.720 519.264 539.4255 524.8190 541.624 609.298 10 c
# ifelse_direct 568.426 570.917 575.7954 573.8430 577.363 599.899 10 d
# double_sub 1316.820 1320.598 1333.2738 1326.0780 1343.518 1363.342 10 e