Question

如何导入文件：

以未定义数量的评论行开头
后跟一行标题，其中一些包含注释字符，用于标识上面的注释行？

例如，使用如下文件：

# comment 1
# ...
# comment X
c01,c#02,c03,c04
1,2,3,4
5,6,7,8

然后：

myDF = read.table(myfile, sep=',', header=T)

read.table（myfile，sep =“，”，header = T）出错：列数更多而不是列名

明显的问题是#被用作注释字符来宣布评论行，但也会在标题中发布（不可否认，这是不好的做法，但我对此无法控制）。

未知先验的评论行数，我甚至无法使用skip参数。另外，我在导入之前不知道列名（甚至不是它们的数字），所以我真的需要从文件中读取它们。

除了手动操作文件之外的任何解决方案？

Answer 1

可能很容易计算以评论开头的行数，然后跳过它们。

csvfile <- "# comment 1
# ...
# comment X
c01,c#02,c03,c04
1,2,3,4
5,6,7,8"

# return a logical for whether the line starts with a comment.
# remove everything from the first FALSE and afterward
# take the sum of what's left
start_comment <- grepl("^#", readLines(textConnection(csvfile)))
start_comment <- sum(head(start_comment, which(!start_comment)[1] - 1))

# skip the lines that start with the comment character
Data <- read.csv(textConnection(csvfile),
                 skip = start_comment,
                 stringsAsFactors = FALSE)

请注意，这适用于read.csv，因为在read.csv，comment.char = ""。如果您必须使用read.table，或者必须使用comment.char = #，则可能需要更多步骤。

start_comment <- grepl("^#", readLines(textConnection(csvfile)))
start_comment <- sum(head(start_comment, which(!start_comment)[1] - 1))

# Get the headers by themselves.
Head <- read.table(textConnection(csvfile),
                   skip = start_comment,
                   header = FALSE,
                   sep = ",",
                   comment.char = "",
                   nrows = 1)

Data <- read.table(textConnection(csvfile),
                   sep = ",",
                   header = FALSE,
                   skip = start_comment + 1,
                   stringsAsFactors = FALSE)

# apply column names to Data
names(Data) <- unlist(Head)

使用read.table

1 个答案: