Question

我很确定我正在寻找的是R中用于阅读科学概念的正则表达式。以下是我所做的和具体细节。我非常感谢任何帮助。

我有一个文本文件，其中一些数字是科学记数法，有些只是小数或整数。我正在尝试使用正则表达式将它们读入R.我写了一个程序来做这个，只要数字没有使用科学记数法或负数，我就成功了。

我写的程序是

getBig <-function(fileName,rows,columns)
{

  dat <-readChar(fileName, file.info(fileName)$size)
 gregexpr('[0-9][/.0-9]+',dat,perl = TRUE)

  s <- regmatches(dat,m)
  s <- s[[1]]
  s<-s[-1] #the first element is the list size
  S <- matrix(s,ncol=rows,nrow=columns)
  S<- t(S)   
  return(S)
}

我尝试修改正则表达式以包含负数和科学记数法，通过使用下面的正则表达式修改上述程序但不成功。有谁知道我哪里出错了？感谢任何帮助，我也有下面的示例文件格式。

 m <- gregexpr(' [-+]?[0-9]*(/.?[0-9]*([eE][-+]?[0-9]?))?',dat,perl = TRUE)

[ - +]？ +或 - 可选

[0-9] *数字0-9最多0次

（启动非可选块 /？ optinal [0-9] *匹配0次或更多次

（开始另一个街区 [EE] [ - +]？ e或E +或 - 可选 [0-9] *数字0-9 1次或更多次）？）？关闭可选

的块

下面的文件格式是行，列

其中（rowN，rowN，rowN）引用第N行的第1-3列。即

[3,1]    ((1,1,-1),-2.542611418857958448210085379141884323299379672715620518130686999531487002844642281770330354890802745e-05,8.586192002176000052697976968885158408090751670240233300961472896241959822732337130019333683974778635e-05))

Answer 1

基于 Regex for numbers on scientific notation?，以下内容可以在 R 中使用：

仅用于科学记数法的正则表达式：

only_sci_notation_numbers_regex <- "^(-?[0-9]*)\\.?[0-9]+[eE]?[-\\+]?[0-9]+$"

科学记数法和非科学记数法小数或整数的正则表达式：

 all_numbers_regex <- "^(-?[0-9]*)((\\.?[0-9]+[eE]?[-\\+]?[0-9]+)|(\\.[0-9]+))*$"

匹配和不匹配的一些模式示例：

 examples_match <- c(
  "0", "1", "1.5", "0.2", "-0", "-1", "-1.5", "-0.2", ".1", "-.1", 
  "1.05E+10", "1.05e+10","-1.05E+10", "-1.05e+10", "1.05E-10", "1.05e-10","-1.05E-10", "-1.05e-10", 
  ".1e5", ".1E5", "-.1e5", "-.1E5")
  
  examples_not_match <- c("1.", "1.e5", "1e5.")
   
  # matches only numbers in scientific notation (so not examples 1-10)
  lapply(examples_match, function(x) grepl(only_sci_notation_numbers_regex, x))
  
  # matches numbers in scientific and non-scientific notation
  lapply(examples_match, function(x) grepl(all_numbers_regex, x))
  
  # doesn't match mis-formatted numbers
  lapply(examples_not_match, function(x) grepl(only_sci_notation_numbers_regex, x))
  lapply(examples_not_match, function(x) grepl(all_numbers_regex, x))

这些正则表达式假定完整的字符串代表您的号码。如果要匹配仅构成字符串一部分的科学/非科学数字（例如，通过 stringr::str_extract 从较长的字符串中提取它），则必须删除开头的 ^ 和 $ 中的相应表达式的结尾。

R正则表达科学记数法

1 个答案: