我经常遇到多个文件具有相同结构但内容不同的情况,最终导致我出现了丑陋且重复的read.table()
行。例如:
df1 <- read.table("file1.tsv", fill = T, header = T, stringsAsFactors = F, quote = "", sep = "\t")
df2 <- read.table("file2.tsv", fill = T, header = T, stringsAsFactors = F, quote = "", sep = "\t")
df3 <- read.table("file3.tsv", fill = T, header = T, stringsAsFactors = F, quote = "", sep = "\t")
df4 <- read.table("file4.tsv", fill = T, header = T, stringsAsFactors = F, quote = "", sep = "\t")
是否有一种方法可以将参数存储为变量,或者以某种方式设置默认值,以避免这种重复性? (也许不是,最近我写了太多的python)。
天真的我尝试过
read_parameters <- c(fill = T, header = T, stringsAsFactors = F, quote = "", sep = "\t")
df1 <- read.table("file1.tsv", read_parameters)
但这会导致错误Error in !header : invalid argument type
。
或者,我可以为每个文件运行一个循环,但是我从来没有发现如何在R中的循环中迭代地命名数据帧,无论如何,我认为对此问题的答案可能对我认为这是常见的情况吗?
答案 0 :(得分:0)
您可以编写一个用于读取表的包装函数,并根据需要设置默认参数
my.read.table <- function(temp.source, fill = T, header = T, stringsAsFactors = F, quote = "", sep = "\t")
{
return(read.table(temp.source, fill = fill, header = header, stringsAsFactors = stringsAsFactors, quote = quote, sep = sep))
}
比起您可以通过以下方式简单调用此功能
df <- my.read.table("file1.tsv")
或者您可以使用lapply在每个源字符串上调用相同的函数。
sources.to.load <- c("file1.tsv", "file2.tsv", "file3.tsv")
df_list <- lapply(sources.to.load, read.table, fill = T, header = T, stringsAsFactors = F, quote = "", sep = "\t")
编辑: 如果还要保留参数向量方法,则可以将其添加到包装函数中。
my.read.table2 <- function(temp.source, fill = T, header = T, stringsAsFactors = F, quote = "", sep = "\t", parameterstring)
{
if(exists("parameterstring"))
{
fill <- as.logical(parameterstring[1])
header <- as.logical(parameterstring[2])
stringsAsFactors <- as.logical(parameterstring[3])
quote <- parameterstring[4]
sep <- parameterstring[5] # if you need this to be more "strict" about the parameternames in the supplied vector: sep <- parameterstring[which(names(parameterstring) == "sep"))]
}
return(read.table(temp.source, fill = fill, header = header, stringsAsFactors = stringsAsFactors, quote = quote, sep = sep))
}
比起您可以通过以下方式简单调用此功能
df <- my.read.table2("file1.tsv") # this will call the function with the default settings
df2 <- my.read.table2("file1.tsv", parameterstring = read_parameters) # this will overwrite the default settings by the parameters stored in read_parameters