我有自动生成的文件,我无法更改格式。我希望能够以适当的R格式存储数据。他们就是这样:
File: /path/to/file
Start Date: 07/05/16
Subject: 0
Start Time: 10:01:09
Name: FooBar
K: 0.000
O: 0.000
A:
0: 91.600 65.000 238.000 31.000 24.000
5: 7.000 22162.000 78.000 10.000 20000.000
10: 55.000 0.000 2.000 6.000 53.000
B:
0: 0.000 2.000 1.000 1.000 1.000
5: 1.000 1.000 1.000 1.000 1.000
[...] # Goes all the way to Z
Start Date: 07/05/16
Subject: 8
Start Time: 10:11:09
Name: JohnDoe
K: 0.000
O: 0.000
A:
0: 91.600 65.000 238.000 31.000 24.000
[...] # Goes all the way to Z
我使用readLines
打开文件,因此每行都是一个长字符。每个文件包含多个会话,这些会话由日期,名称,主题和时间标识。每个会话包含多个表示字母表的数字变量(LETTERS
)。例如,在第一个会话(FooBar)中,K可以表示为c(0.000)
,B可以表示为
c(0.000,2.000,1.000,1.000,1.000,1.000,1.000,1.000,1.000,1.000)
第一行(文件,开始日期,开始时间,名称)是我能够在该数据帧中保存的会话的信息:
#Sessions data.frame
structure(list(`Start Date` = c("07/05/16", "07/05/16"), Subject = c("0", "8"), `Start Time` = c("10:01:09",
"10:11:09"), Name = c("FooBar",
"JohnDoe"
)), .Names = c("Start Date", "Subject", "Start Time", "name"), row.names = 1:2, class = "data.frame")
他们是我正在努力的两件事
我考虑过apply
,startsWith
和scan
的组合,但我无法找到构建数据的最佳方式。
答案 0 :(得分:0)
也许不完全是你所追求的,但这是我能想到的最好的一个缺点,那就是代码在循环中增长,因为我们无法猜测如何矢量将提前完成。
我们无法创建data.frame,因为A,B等矢量不是相同的长度(它需要用NA填充它们,但听起来根本不感兴趣)
sessions <- list()
sd <- subject <- st <- sname <- cvec <- ""
lines = readLines("c:/tmp/test.txt")
cases <- c("Start Date:", "Subject:", "Start Time:", "Name:", LETTERS, " ")
lnames <- c("SDate", "Subject", "STime", "Name", LETTERS)
for (l in lines) { # loop on line
if (nchar(l) < 2) # skip lines with less than 1 char ( A: )
next
v <- lnames[min(which(startsWith(l, cases)))] # Get the "field" name
fields <- strsplit(l, " ")[[1]]
# Here comes the fun, for each case store the value or update a vector
if (is.na(v)) { # No field, it's a line of the form "spaces digit: space separated values"
vals <- fields[nchar(fields) > 1]
sessions[[sname]][[cvec]] <-
c(sessions[[sname]][[cvec]], as.integer(vals[-1])) # We just concatenate with previous value for this letter
}
else if (v == "SDate")
sd <- fields[3]
else if (v == "Subject")
subject = fields[2]
else if (v == "STime")
st <- fields[3]
else if (v == "Name") {
sname <- fields[2]
# Create a new session list entry
sessions[[sname]] = list(
"SDate" = sd,
"STime" = st,
"Subject" = as.numeric(subject)
)
}
else if (any(v %in% LETTERS)) { # Swich letter vector, use on line value if there's some
cvec <- v
sessions[[sname]][[cvec]] <- vector("numeric")
if (length(fields) > 1) {
vals <- fields[-1]
sessions[[sname]][[cvec]] <- as.numeric(vals[nchar(vals) > 1])
}
}
}
这会创建一个列表列表:
> str(sessions)
List of 2
$ FooBar :List of 7
..$ SDate : chr "07/05/16"
..$ STime : chr "10:01:09"
..$ Subject: num 0
..$ K : num 0
..$ O : num 0
..$ A : num [1:15] 91 65 238 31 24 ...
..$ B : num [1:10] 0 2 1 1 1 1 1 1 1 1
$ JohnDoe:List of 7
..$ SDate : chr "07/05/16"
..$ STime : chr "10:11:09"
..$ Subject: num 8
..$ K : num 0
..$ O : num 0
..$ A : num [1:15] 91 65 238 31 24 ...
..$ B : num [1:10] 0 2 1 1 1 1 1 1 1 1
这给会话&#34; FooBar&#34;:
sessions$FooBar
$SDate
[1] "07/05/16"
$STime
[1] "10:01:09"
$Subject
[1] 0
$K
[1] 0
$O
[1] 0
$A
[1] 91 65 238 31 24 7 22162 78 10 20000 55 0 2 6 53
$B
[1] 0 2 1 1 1 1 1 1 1 1