grep(),str_match()匹配文件中的[DATA]字符串,并在找到匹配项后读取文件下方的内容

时间:2019-07-08 07:48:41

标签: r regex stringr

试图读取*.BAY文件,其格式都不用逗号,空格和制表符分隔。需要找到字符串 [DATA] 的匹配项,找到匹配项后,请阅读下面的所有内容。

下面是文件的内容

  

[FILEINFO] VERSION = V4.0 FILENAME = TEST1.BAY CREATIONTIME = 2017-10-05    16:05:28

     

[PARAMETER1] TXT = SENSITIVE单位= LSL = -41.800000 USL = -38.300000

     

[PARAMETER2] TXT = HARM单位= LSL = -1.000000 USL = 1.000000

     

[数据]   1,29,-41.699,0.075,-1.642,-97.207,55.608,0.533,165.848,0.000,0.000,60.000   2,29,-40.637,0.126,-1.934,-96.637,56.100,0.649,153.259,0.000,1.000,60.000   3,29,-40.227,0.052,-1.850,-96.231,56.104,0.548,158.987,0.000,2.000,60.000

我使用下面的代码读取文件。

my_txt <- paste(readLines("/TEST1.BAY)        
my_txt

我使用了grep()函数来搜索 [DATA] 字符串。但是,在使用以下模式进行grep之后,我只会得到integer(empty)

my_txt <- grep("^[DATA.*]$",my_txt)
my_txt

关于匹配模式并阅读 [DATA]

下面内容的任何建议

2 个答案:

答案 0 :(得分:2)

假设您已经以字符串形式读取数据,则可以删除所有内容,直到"[DATA"],然后使用read.csv

read.csv(text = sub(".*\\[DATA\\]\\s+", "", my_txt), header = FALSE)

#  V1 V2      V3    V4     V5   ....  
#1  1 29 -41.699 0.075 -1.642   ....

这将在单独的列中提供所有数据。如果要将它们放在一栏中,请用换行符“ \n"

替换逗号
read.csv(text=gsub(",", "\n", sub(".*\\[DATA\\]\\s+", "", my_txt)), header = FALSE)

#         V1
#1         1
#2        29
#3   -41.699
#4     0.075
#5    -1.642
#....

答案 1 :(得分:1)

使用strsplit

的可能方法
# read data
my_txt <- paste(readLines("clipboard"), collapse = "")      
my_txt

# split in two strings when there is "[DATA]"
my_txt <- strsplit(my_txt, "[DATA]", fixed = TRUE)

# get second string
my_txt <- my_txt[[1]][2]

# convert to vector of numeric
data <- as.numeric(strsplit(my_txt, ",")[[1]])