将上标数字从字符串转换为科学计数法(来自Unicode,UTF8)

时间:2019-06-03 13:05:36

标签: r superscript

我从Excel表中导入了p值向量。数字以上标Unicode字符串形式给出。经过数小时的尝试,我仍然很难将它们转换为数字。

请参见下面的示例。使用as.numeric()进行简单转换不起作用。我还尝试使用Regex捕获上标数字,但事实证明,每个上标数字都有一个独特的Unicode代码,对此没有翻译。

test <- c("0.0126", "0.000289", "4.26x10⁻¹⁴", "6.36x10⁻⁴⁸", 
          "4.35x10⁻⁹", "0.115", "0.0982", "0.000187", "0.0484", "0.000223")

as.numeric(test)

有人知道一个R-package可以轻松完成翻译吗,还是我必须将代码一一翻译成数字?

1 个答案:

答案 0 :(得分:1)

这种格式肯定不是非常可移植的...不过,对于练习来说,这是一个可能的解决方案...

test <- c("0.0126", "0.000289", "4.26x10⁻¹⁴", "6.36x10⁻⁴⁸",
          "4.35x10⁻⁹", "0.115", "0.0982", "0.000187", "0.0484",
          "0.000223")

library(utf8)
library(stringr)

# normalize, ie everything to "normal text"
testnorm <- utf8_normalize(test, map_case = TRUE, map_compat = TRUE)

# replace exponent part
# \\N{Minus Sign} is the unicode name of the minus sign symbol
# (see [ICU regex](http://userguide.icu-project.org/strings/regexp))
# it is necessary because the "-" is not a plain text minus sign...
testnorm <- str_replace_all(testnorm, "x10\\N{Minus Sign}", "e-")

# evaluate these character strings
p_vals <- sapply(X = testnorm,
                    FUN = function(x) eval(parse(text = x)),
                    USE.NAMES = FALSE
)

# everything got adjusted to the "e-48" element...
format(p_vals, digits = 2, scientific = F)