Question

我有一个一个有很多数字的矢量（＆gt; 1E9元素），想要派生 数字精度（数字中的位数）和数字刻度（数字中小数点右边的位数）。

我怎样才能非常快（矢量化）？

存在一个部分答案（how to return number of decimal places in R）的问题，但解决方案既不快速（矢量化）也不计算数字精度。

示例：

# small example vector with numeric data
x <- c(7654321, 54321.1234, 321.123, 321.123456789)

> numeric.precision(x)  # implementation is the answer
[1] 7, 9, 6, 12

> numeric.scale(x)      # implementation is the answer
[1] 0, 4, 3, 9

可选的“糖”（后来添加到这个问题 - thx到@thc和@gregor）：

如何避免由于内部不精确而过多计算数字在计算机中存储的数字（例如浮动）？

> x = 54321.1234
> as.character(x)
[1] "54321.1234"
> print(x, digits = 22)
[1] 54321.12339999999676365

Answer 1

这是一个开始的基本R方法它必然太慢，但至少计算出所需的结果。

# precision
nchar(sub(".", "", x, fixed=TRUE))
[1]  7  9  6 12

# scale
nchar(sub("\\d+\\.?(.*)$", "\\1", x))
[1] 0 4 3 9

对于这种方法，我建议在data.table的{{1}}中使用colClasses参数，以避免首先转换为数字精度问题：

fread

可能需要在输入期间将矢量转换为数字，如注释中所述，例如，某些输入值在文本文件中以科学计数法表示。在这种情况下，如this answer中所述，可能需要使用格式化语句或x <- unlist(fread("7654321 54321.1234 321.123 321.123456789", colClasses="character"), use.names=FALSE)强制从此格式转换为标准十进制格式。

Answer 2

这是数学版本的想法（然后用字符操作更快）。你可以把它放在函数的缩放和精度中，其中函数精度称为缩放函数。

for (i in 1:length(x)) {
     after <- 0
     while(x[i]*(10^after) != round(x[i]*(10^after))) 
     { after <- after + 1 }
     cat(sprintf("Scale: %s\n", after))
     before <- floor(log10(abs(x[i])))+1
     cat(sprintf("Precision: %s\n", before+after))
 }

结果：

Scale: 0
Precision: 7
Scale: 4
Precision: 9
Scale: 3
Precision: 6
Scale: 9
Precision: 12

Answer 3

只是将所有评论和答案合并到一个即时使用的解决方案中，该解决方案也考虑了不同的国家/地区（语言环境）和NA我将此作为答案发布（请给@Imo，@ Gregor等人提供信用））。

编辑（2017年2月9日）：添加SQL.precision作为返回值，因为它可能与数学精度不同。

#' Calculates the biggest precision and scale that occurs in a numeric vector
#'
#' The scale of a numeric is the count of decimal digits in the fractional part (to the right of the decimal point).
#' The precision of a numeric is the total count of significant digits in the whole number,
#' that is, the number of digits to both sides of the decimal point. 
#'
#' To create a suitable numeric data type in a SQL data base use the returned \code{SQL.precision} which
#' is defined by \code{max(precision, non.fractional.precision + scale)}.
#'
#' @param x numeric vector
#'
#' @return A list with four elements:
#'         precision (total number of significant digits in the whole number),
#'         scale (number of digits in the fractional part),
#'         non.fractional.precision (number of digits at the left side and SQL precision.
#'
#' @details NA will be counted as precision 1 and scale 0!
#'
#' @examples
#'
#' \preformatted{
#' x <- c(0, 7654321, 54321.1234, 321.123, 321.123456789, 54321.1234, 100000000000, 1E4, NA)
#' numeric.precision.and.scale(x)
#' numeric.precision.and.scale(c(10.0, 1.2))   # shows why the SQL.precision is different
#' }
numeric.precision.and.scale <- function(x) {

  # Remember current options
  old.scipen <- getOption("scipen")

  # Overwrite options
  options(scipen = 999)   # avoid scientific notation when converting numerics to strings

  # Extract the decimal point character of the computer's current locale
  decimal.sign <- substr( 1 / 2, 2, 2)

  x.string <- as.character(x[!is.na(x)])

  if (length(x.string) > 0) {
    # calculate
    precision <- max(nchar(sub(decimal.sign, "", x.string, fixed = TRUE)))
    scale <- max(nchar(sub(paste0("\\d+\\", decimal.sign, "?(.*)$"), "\\1", x.string)))
    non.fractional.precision <- max(trunc(log10(abs(x))) + 1, na.rm = TRUE)
    SQL.precision <- max(precision, non.fractional.precision + scale)

    # Reset changed options
    options(scipen = old.scipen)
  } else {
    precision <- 1
    scale <- 0
    non.fractional.precision <- 1
    SQL.precision <- 1
  }

  return(list(precision = precision,
              scale = scale,
              non.fractional.precision = non.fractional.precision,
              SQL.precision = SQL.precision))
}

快速获取数值向量的数字精度和比例（n / o小数点）

3 个答案: