Question

我正在阅读一些文本文件，其中包含数据行，顶部有几个标题行，包含数据信息，如下所示：

Test file
#
File information
1 2 3 4
#
a 2
b 4
c 6
d 8

我想从这个文件中单独阅读各种信息。我可以这样完成这件事：

file <- read.table(txt, nrow = 1)
name <- read.table(txt, nrow = 1, skip = 2)
vals <- read.table(txt, nrow = 1, skip = 3)
data <- read.table(txt,           skip = 5)

由于两条空白的注释行，我也可以读取这样的数据：

file <- read.table(txt, nrow = 1)
name <- read.table(txt, nrow = 1, skip = 1)  # Skip changed from 2
vals <- read.table(txt, nrow = 1, skip = 3)
data <- read.table(txt,           skip = 4)  # Skip changed from 5

这很好，但文本文件并不总是具有相同数量的空白注释行;有时它们存在，有时它们不存在。如果我丢失了示例文本文件中的注释行（或两者），我的解决方案都不会继续有效。

是否有一种更健壮的方式来读取skip变量永远不会计入注释行的文本文件？

Answer 1

（假设：在顶部的文件元数据之后，一旦数据开始，就没有更多的评论。）

（textConnection(...)的使用是欺骗函数，期望文件连接处理字符串。用文件名替换函数调用。）

一种方法是读取文件的第一行{{1}行（某些数字＆＃34;保证＆＃34;包括所有注释/非数据行），找到最后一行，然后因此处理所有之前和之后全部：

（顺便说一下：应该检查以确保实际上有评论...否则会返回txt <- "Test file # File information 1 2 3 4 # a 2 b 4 c 6 d 8" max_comment_lines <- 8 (dat <- readLines(textConnection(txt), n = max_comment_lines)) # [1] "Test file" "#" "File information" "1 2 3 4" # [5] "#" "a 2" "b 4" "c 6" (skip <- max(grep("^\\s*#", dat))) # [1] 5，而integer(0)函数不会将其作为参数。）

现在我们已经知道＆＃34;最后找到的评论是在第5行，我们可以使用前4行来获取标题信息......

read*

...并跳过5行来获取数据。

meta <- readLines(textConnection(txt), n = skip - 1)
meta <- meta[! grepl("^\\s*#", meta) ] # remove the comment rows themselves
meta
# [1] "Test file"        "File information" "1 2 3 4"

使用read.table

1 个答案: