Question

我有一个要在R中读取的文本文件（并存储在data.frame中）。该文件按行和列组织。 “sep”和“eol”都是定制的。

问题：自定义eol，即“\ t＆amp; nd”（不带引号），不能在read.table（...）（或read.csv（...），read.csv2中设置（ ...），...）也不在fread（......）中，我无法找到解决方案。

我在这里搜索（“[r] read eol”和其他我不记得了）并且我找不到解决方案：唯一一个是预处理文件改变eol（在我的情况下不可能）因为在某些领域我可以找到类似\ n，\ n \，\ n \ r \ n，“，......这就是自定义的原因。”

谢谢！

Answer 1

您可以采用以下两种方式：

一个。如果文件不是太宽，您可以使用scan读取所需的行，并使用strsplit将其拆分为所需的列，然后合并为data.frame。例如：

# Provide reproducible example of the file ("raw.txt" here) you are starting with
your_text <- "a~b~c!1~2~meh!4~5~wow"
write(your_text,"raw.txt"); rm(your_text)  

eol_str = "!" # whatever character(s) the rows divide on
sep_str = "~" # whatever character(s) the columns divide on

# read and parse the text file   
# scan gives you an array of row strings (one string per row)
# sapply strsplit gives you a list of row arrays (as many elements per row as columns)
f <- file("raw.txt")
row_list <- sapply(scan("raw.txt", what=character(), sep=eol_str), 
                   strsplit, split=sep_str) 
close(f)

df <- data.frame(do.call(rbind,row_list[2:length(row_list)]))
row.names(df) <- NULL
names(df) <- row_list[[1]]

df
#   a b   c
# 1 1 2 meh
# 2 4 5 wow

B中。如果A不起作用，我同意@BondedDust您可能需要一个外部实用程序 - 但您可以使用system()在R中调用它并执行查找/替换以重新格式化read.table的文件。您的调用将特定于您的操作系统。示例：https://askubuntu.com/questions/20414/find-and-replace-text-within-a-file-using-commands。既然您已经注意到文本中已经有\n和\r\n，我建议您先找到并用临时占位符替换它们 - 也许是引用自己的版本 - 然后您可以转换它们在构建data.frame之后回来。

在R中，如何使用自定义行尾（eol）读取文件

1 个答案: