将相同文件的多个read.table()的参数设置为变量

时间:2019-05-25 06:33:49

标签: r dataframe read.table

我经常遇到多个文件具有相同结构但内容不同的情况,最终导致我出现了丑陋且重复的read.table()行。例如:

df1 <- read.table("file1.tsv", fill = T, header = T, stringsAsFactors = F, quote = "", sep = "\t")
df2 <- read.table("file2.tsv", fill = T, header = T, stringsAsFactors = F, quote = "", sep = "\t")
df3 <- read.table("file3.tsv", fill = T, header = T, stringsAsFactors = F, quote = "", sep = "\t")
df4 <- read.table("file4.tsv", fill = T, header = T, stringsAsFactors = F, quote = "", sep = "\t")

是否有一种方法可以将参数存储为变量,或者以某种方式设置默认值,以避免这种重复性? (也许不是,最近我写了太多的python)。

天真的我尝试过

read_parameters <- c(fill = T, header = T, stringsAsFactors = F, quote = "", sep = "\t")
df1 <- read.table("file1.tsv", read_parameters)

但这会导致错误Error in !header : invalid argument type

或者,我可以为每个文件运行一个循环,但是我从来没有发现如何在R中的循环中迭代地命名数据帧,无论如何,我认为对此问题的答案可能对我认为这是常见的情况吗?

1 个答案:

答案 0 :(得分:0)

您可以编写一个用于读取表的包装函数,并根据需要设置默认参数

my.read.table <- function(temp.source, fill = T, header = T, stringsAsFactors = F, quote = "", sep = "\t")
{
 return(read.table(temp.source, fill = fill, header = header, stringsAsFactors = stringsAsFactors, quote = quote, sep = sep))
}

比起您可以通过以下方式简单调用此功能

df <- my.read.table("file1.tsv")

或者您可以使用lapply在每个源字符串上调用相同的函数。

sources.to.load <- c("file1.tsv", "file2.tsv", "file3.tsv")
df_list <- lapply(sources.to.load, read.table, fill = T, header = T, stringsAsFactors = F, quote = "", sep = "\t")

编辑: 如果还要保留参数向量方法,则可以将其添加到包装函数中。

my.read.table2 <- function(temp.source, fill = T, header = T, stringsAsFactors = F, quote = "", sep = "\t", parameterstring)
{
 if(exists("parameterstring"))
 {
  fill <- as.logical(parameterstring[1])
  header <- as.logical(parameterstring[2])
  stringsAsFactors <- as.logical(parameterstring[3])
  quote <- parameterstring[4]
  sep <- parameterstring[5] # if you need this to be more "strict" about the parameternames in the supplied vector: sep <- parameterstring[which(names(parameterstring) == "sep"))]
 }
 return(read.table(temp.source, fill = fill, header = header, stringsAsFactors = stringsAsFactors, quote = quote, sep = sep))
}

比起您可以通过以下方式简单调用此功能

df <- my.read.table2("file1.tsv") # this will call the function with the default settings
df2 <- my.read.table2("file1.tsv", parameterstring = read_parameters) # this will overwrite the default settings by the parameters stored in read_parameters