R as.numeric整数中的Ascii文件不正确

时间:2014-11-01 11:11:01

标签: r ascii

我已经将一个ascii(.spe)文件读入R.这个文件包含一列,主要是整数。但是R不正确地解释这些整数,可能是因为我没有指定正确的格式或类似的东西。该文件是在Ortec Maestro软件中生成的。这是代码:

library(SDMTools)
strontium<-read.table("C:/Users/Hal 2/Desktop/beta_spec/strontium 90 spectrum.spe",header=F,skip=2)
str_spc<-vector(mode="numeric")
for (i in 1:2037)
{
str_spc[i]<-as.numeric(strontium$V1[i+13])
}

这里,例如,strontium $ V1 [14]的值为0,但R将其解释为10.我想我可能必须将数据转换为其他格式,或类似的东西,但我我不确定,我可能在搜索错误的搜索字词。

以下是文件中的前几行:

$SPEC_ID:
No sample description was entered.
$SPEC_REM:
DET# 1
DETDESC# MCB 129
AP# Maestro Version 6.08
$DATE_MEA:
10/14/2014 15:13:16
$MEAS_TIM:
1516 1540
$DATA:
0 2047

以下是该文件的链接:https://www.dropbox.com/sh/y5x68jen487qnmt/AABBZyC6iXBY3e6XH0XZzc5ba?dl=0

任何帮助表示感谢。

1 个答案:

答案 0 :(得分:0)

我看到有人为SPE Spectra文件in python制作了一个解析器,如果没有至少一个功能最低的R版本,我就不能让它站起来,所以这里有一个解析一些字段但是得到的你的数据:

library(stringr)
library(gdata)
library(lubridate)

read.spe <- function(file) {

  tmp <- readLines(file)

  tmp <- paste(tmp, collapse="\n")

  records <- strsplit(tmp, "\\$")[[1]]
  records <- records[records!=""]

  spe <- list()

  spe[["SPEC_ID"]] <- str_match(records[which(startsWith(records, "SPEC_ID"))],
                                "^SPEC_ID:[[:space:]]*([[:print:]]+)[[:space:]]+")[2]

  spe[["SPEC_REM"]] <- strsplit(str_match(records[which(startsWith(records, "SPEC_REM"))],
                                          "^SPEC_REM:[[:space:]]*(.*)")[2], "\n")

  spe[["DATE_MEA"]] <- mdy_hms(str_match(records[which(startsWith(records, "DATE_MEA"))],
                                         "^DATE_MEA:[[:space:]]*(.*)[[:space:]]$")[2])

  spe[["MEAS_TIM"]] <- strsplit(str_match(records[which(startsWith(records, "MEAS_TIM"))],
                                          "^MEAS_TIM:[[:space:]]*(.*)[[:space:]]$")[2], "\n")[[1]]

  spe[["ROI"]] <- str_match(records[which(startsWith(records, "ROI"))],
                            "^ROI:[[:space:]]*(.*)[[:space:]]$")[2]

  spe[["PRESETS"]] <- strsplit(str_match(records[which(startsWith(records, "PRESETS"))],
                                         "^PRESETS:[[:space:]]*(.*)[[:space:]]$")[2], "\n")[[1]]

  spe[["ENER_FIT"]] <- strsplit(str_match(records[which(startsWith(records, "ENER_FIT"))],
                                          "^ENER_FIT:[[:space:]]*(.*)[[:space:]]$")[2], "\n")[[1]]

  spe[["MCA_CAL"]] <- strsplit(str_match(records[which(startsWith(records, "MCA_CAL"))],
                                         "^MCA_CAL:[[:space:]]*(.*)[[:space:]]$")[2], "\n")[[1]]

  spe[["SHAPE_CAL"]] <- str_match(records[which(startsWith(records, "SHAPE_CAL"))],
                                  "^SHAPE_CAL:[[:space:]]*(.*)[[:space:]]*$")[2]

  spe_dat <- strsplit(str_match(records[which(startsWith(records, "DATA"))],
                                "^DATA:[[:space:]]*(.*)[[:space:]]$")[2], "\n")[[1]]

  spe[["SPE_DAT"]] <- as.numeric(gsub("[[:space:]]", "", spe_dat)[-1])

  return(spe)

}

dat <- read.spe("strontium 90 spectrum.Spe")

str(dat)
## List of 10
##  $ SPEC_ID  : chr "No sample description was entered."
##  $ SPEC_REM :List of 1
##   ..$ : chr [1:3] "DET# 1" "DETDESC# MCB 129" "AP# Maestro Version 6.08"
##  $ DATE_MEA : POSIXct[1:1], format: "2014-10-14 15:13:16"
##  $ MEAS_TIM : chr "1516 1540"
##  $ ROI      : chr "0"
##  $ PRESETS  : chr [1:3] "None" "0" "0"
##  $ ENER_FIT : chr "0.000000 0.002529"
##  $ MCA_CAL  : chr [1:2] "3" "0.000000E+000 2.529013E-003 0.000000E+000 keV"
##  $ SHAPE_CAL: chr "3\n3.100262E+001 0.000000E+000 0.000000E+000"
##  $ SPE_DAT  : num [1:2048] 0 0 0 0 0 0 0 0 0 0 ...

head(dat$SPE_DAT)
## [1] 0 0 0 0 0 0

它需要一些润色,并且绝对没有错误检查(即缺少字段),但今天没有时间处理它。我将在接下来的几天内完成解析并为它制作一个最小的包装。