我有一个一个有很多数字的矢量(> 1E9元素),想要派生 数字精度(数字中的位数)和数字刻度(数字中小数点右边的位数)。
我怎样才能非常快(矢量化)?
存在一个部分答案(how to return number of decimal places in R)的问题,但解决方案既不快速(矢量化)也不计算数字精度。
示例:
# small example vector with numeric data
x <- c(7654321, 54321.1234, 321.123, 321.123456789)
> numeric.precision(x) # implementation is the answer
[1] 7, 9, 6, 12
> numeric.scale(x) # implementation is the answer
[1] 0, 4, 3, 9
可选的“糖”(后来添加到这个问题 - thx到@thc和@gregor):
如何避免由于内部不精确而过多计算数字在计算机中存储的数字(例如浮动)?
> x = 54321.1234
> as.character(x)
[1] "54321.1234"
> print(x, digits = 22)
[1] 54321.12339999999676365
答案 0 :(得分:3)
这是一个开始的基本R方法它必然太慢,但至少计算出所需的结果。
# precision
nchar(sub(".", "", x, fixed=TRUE))
[1] 7 9 6 12
# scale
nchar(sub("\\d+\\.?(.*)$", "\\1", x))
[1] 0 4 3 9
对于这种方法,我建议在data.table
的{{1}}中使用colClasses参数,以避免首先转换为数字精度问题:
fread
可能需要在输入期间将矢量转换为数字,如注释中所述,例如,某些输入值在文本文件中以科学计数法表示。在这种情况下,如this answer中所述,可能需要使用格式化语句或x <- unlist(fread("7654321
54321.1234
321.123
321.123456789", colClasses="character"), use.names=FALSE)
强制从此格式转换为标准十进制格式。
答案 1 :(得分:1)
这是数学版本的想法(然后用字符操作更快)。你可以把它放在函数的缩放和精度中,其中函数精度称为缩放函数。
for (i in 1:length(x)) {
after <- 0
while(x[i]*(10^after) != round(x[i]*(10^after)))
{ after <- after + 1 }
cat(sprintf("Scale: %s\n", after))
before <- floor(log10(abs(x[i])))+1
cat(sprintf("Precision: %s\n", before+after))
}
结果:
Scale: 0
Precision: 7
Scale: 4
Precision: 9
Scale: 3
Precision: 6
Scale: 9
Precision: 12
答案 2 :(得分:0)
只是将所有评论和答案合并到一个即时使用的解决方案中,该解决方案也考虑了不同的国家/地区(语言环境)和NA
我将此作为答案发布(请给@Imo,@ Gregor等人提供信用) )。
编辑(2017年2月9日):添加SQL.precision
作为返回值,因为它可能与数学精度不同。
#' Calculates the biggest precision and scale that occurs in a numeric vector
#'
#' The scale of a numeric is the count of decimal digits in the fractional part (to the right of the decimal point).
#' The precision of a numeric is the total count of significant digits in the whole number,
#' that is, the number of digits to both sides of the decimal point.
#'
#' To create a suitable numeric data type in a SQL data base use the returned \code{SQL.precision} which
#' is defined by \code{max(precision, non.fractional.precision + scale)}.
#'
#' @param x numeric vector
#'
#' @return A list with four elements:
#' precision (total number of significant digits in the whole number),
#' scale (number of digits in the fractional part),
#' non.fractional.precision (number of digits at the left side and SQL precision.
#'
#' @details NA will be counted as precision 1 and scale 0!
#'
#' @examples
#'
#' \preformatted{
#' x <- c(0, 7654321, 54321.1234, 321.123, 321.123456789, 54321.1234, 100000000000, 1E4, NA)
#' numeric.precision.and.scale(x)
#' numeric.precision.and.scale(c(10.0, 1.2)) # shows why the SQL.precision is different
#' }
numeric.precision.and.scale <- function(x) {
# Remember current options
old.scipen <- getOption("scipen")
# Overwrite options
options(scipen = 999) # avoid scientific notation when converting numerics to strings
# Extract the decimal point character of the computer's current locale
decimal.sign <- substr( 1 / 2, 2, 2)
x.string <- as.character(x[!is.na(x)])
if (length(x.string) > 0) {
# calculate
precision <- max(nchar(sub(decimal.sign, "", x.string, fixed = TRUE)))
scale <- max(nchar(sub(paste0("\\d+\\", decimal.sign, "?(.*)$"), "\\1", x.string)))
non.fractional.precision <- max(trunc(log10(abs(x))) + 1, na.rm = TRUE)
SQL.precision <- max(precision, non.fractional.precision + scale)
# Reset changed options
options(scipen = old.scipen)
} else {
precision <- 1
scale <- 0
non.fractional.precision <- 1
SQL.precision <- 1
}
return(list(precision = precision,
scale = scale,
non.fractional.precision = non.fractional.precision,
SQL.precision = SQL.precision))
}