R函数循环两次?

时间:2019-01-07 19:53:12

标签: r for-loop nested-loops

我编写了一个循环,该循环输入多个文本文件,对每个文本文件执行一些功能并将其组合。我已将其复制到下方并注释了每一行。但是, i 中的第一个文件被读取了两次(并添加到我的最终表中)! 另外,希望简化此循环。

source_files<-list.files(pattern="_output.txt") # This line finds all file ending with .txt
上面的

source_files 列出了要在下面的循环中输入的适当文件。

for (i in source_files){
    if (!exists("final_table")){
        df_import<-read.table(i, header=FALSE, sep="\t") # reads in each file
        names<-unlist(strsplit(i,"_")) # reformats input file name and parses to 'names'
        df_import$Sample<-names[1] # replaces col[1] header with first part of file name
        df_import$DB<-names[2] # replaces col[1] header with first part of file name
        final_table<-df_import # creates the final table data frame
        rm(df_import) # remove excess df
        }
    if (exists("final_table")){
        df_import<-read.table(i, header=FALSE, sep="\t") # reads in each file
        names<-unlist(strsplit(i,"_")) # reformats input file name and parses to 'names'
        df_import$Sample<-names[1] # replaces col[1] header with first part of file name
        df_import$DB<-names[2] # replaces col[1] header with first part of file name
        final_table <-rbind(final_table, df_import) # Adds to existing final table
        rm(df_import)   
    }
}

此循环工作得很好,除了 final_table 具有重复项-有任何建议吗?

2 个答案:

答案 0 :(得分:2)

好吧,您测试该表是否存在于第一个if中,如果不存在,它将创建该表并向其中添加一行。因此,当您到达第二个if时,该表确实存在,但是会再次添加该行。与其使用两个if语句,不如使用一个if/else。另外,也许只需将final_table <-...行移至if中,然后将其他行移出,这样就不必重复太多代码了。

也许

for (i in source_files){
    df_import<-read.table(i, header=FALSE, sep="\t") # reads in each file
    names<-unlist(strsplit(i,"_")) # reformats input file name and parses to 'names'
    df_import$Sample<-names[1] # replaces col[1] header with first part of file name
    df_import$DB<-names[2] # replaces col[1] header with first part of file name
    if (!exists("final_table")){
        final_table<-df_import # creates the final table data frame
    } else {
        final_table <-rbind(final_table, df_import) # Adds to existing final table
    }
    rm(df_import) # remove excess df
}

与每次循环并重新绑定相比,还有更好的方法可以做到这一点。查看此答案:What's wrong with my function to load multiple .csv files into single dataframe in R using rbind?

答案 1 :(得分:1)

我会采取略有不同的方法。看来,if()块中唯一的区别是对final_table的处理方式。我可能会按照以下方式做些事情:

#This mimics your list.files() call
list_of_files <- list(mtcars, mtcars, mtcars)

#put the guts of your code inside a function
process_file <- function(file) {
  #your stuff goes here - I'm just going to add a random variable named foo      
  file$foo <- rnorm(nrow(file))
  return(file)
}
#use lapply to iterate over your list of files and do.call to bind them together
output <- do.call("rbind", lapply(list_of_files, process_file))

reprex package(v0.2.1)于2019-01-07创建