我有一堆文件,它们都包含两列,一个标识符和某个特征的某个值:
ID,score
1,.02
2,.04
4,.03
... etc
我想对这些文件执行以下3个操作:
# 1: read in the file
MyData_trait1 <- read.csv("datafile_trait1.csv",header=T,sep = ",")
# 2: update column names
colnames(MyData_trait1) <- c("ID","trait1")
# 3: merge to another data frame
MyData <- merge(MyData, MyData_trait1, by.x = "ID", by.y = "ID")
我想在for循环中执行此操作,但到目前为止,我只是通过执行此操作来读取文件:
for (trait in c("trait1","trait2","trait3","trait4",
"trait5","trait6","trait7","trait8")){
# 1: read in files
varname <- paste("MyData_", trait, sep = "")
assign(varname, read.csv(paste0("datafile_", trait, ".csv"),header=T,sep = ","))
}
如何在此for循环中合并其他两个步骤? 目标是最终使MyData成为这种格式的数据框架:
ID trait1 trait2 trait3 trait4 trait5 trait6 trait7 trait8
1 .02 .04 .05 .05 .04 .04 .07 .03
2 .04 .04 .07 .03 .02 .04 .05 .05
4 .03 .07 .05 .04 .03 .07 .05 .04
.... etc
答案 0 :(得分:1)
(i)阅读
中的文件vars <- paste0("trait", 1:8)
files <- paste0("datafile_", vars, ".csv")
all_dat <- lapply(files, read.csv, header=TRUE)
names(all_dat) <- vars
(ii)在每个表中插入特征名称,并将所有表合并为单个数据框(长格式)
library(dplyr)
df <- plyr::ldply(vars, function(x) all_dat[[x]] %>% mutate(var=x))
(iii)将长数据帧转换为宽格式
library(tidyr)
df %>% spread(var, score)