在R中的for循环中执行以下操作:读取文件,分配列值和合并文件

时间:2017-11-07 00:44:58

标签: r for-loop

我有一堆文件,它们都包含两列,一个标识符和某个特征的某个值:

ID,score
1,.02
2,.04
4,.03
... etc

我想对这些文件执行以下3个操作:

# 1: read in the file
MyData_trait1 <- read.csv("datafile_trait1.csv",header=T,sep = ",")
# 2: update column names
colnames(MyData_trait1) <- c("ID","trait1")
# 3: merge to another data frame
MyData <- merge(MyData, MyData_trait1, by.x = "ID", by.y = "ID")

我想在for循环中执行此操作,但到目前为止,我只是通过执行此操作来读取文件:

for (trait in c("trait1","trait2","trait3","trait4",
            "trait5","trait6","trait7","trait8")){
# 1: read in files
varname <- paste("MyData_", trait, sep = "")
assign(varname, read.csv(paste0("datafile_", trait, ".csv"),header=T,sep = ","))
}

如何在此for循环中合并其他两个步骤? 目标是最终使MyData成为这种格式的数据框架:

ID trait1 trait2 trait3 trait4 trait5 trait6 trait7 trait8 
1 .02 .04 .05 .05 .04 .04 .07 .03 
2 .04 .04 .07 .03 .02 .04 .05 .05
4 .03 .07 .05 .04 .03 .07 .05 .04
.... etc

1 个答案:

答案 0 :(得分:1)

(i)阅读

中的文件
vars <- paste0("trait", 1:8)
files <- paste0("datafile_", vars, ".csv")
all_dat <- lapply(files, read.csv, header=TRUE)
names(all_dat) <- vars

(ii)在每个表中插入特征名称,并将所有表合并为单个数据框(长格式)

library(dplyr)
df <- plyr::ldply(vars, function(x) all_dat[[x]] %>% mutate(var=x))

(iii)将长数据帧转换为宽格式

library(tidyr)
df %>% spread(var, score)