如何按列拆分大型数据框并写入单个CSV

时间:2017-09-10 16:31:11

标签: r

我有一个包含100个列名的大型csv文件new.dat。我想在每个列名中拆分new.dat,将所有新子集中的第一列保留为.csv

new.dat

new.dat <- structure(list(Sequence = c("AAAAAACCTGTTCTGATA", "AAAAAAGGCTGTTACTGAGC", 
"AAAAACATTCGAGCGAGATCTCT", "AAAAACCTCGACTTCGGAAG", "AAAAAGCTCGTAGTTGAA", 
"AAAAAGCTCGTAGTTGAAC"), WT1 = c("84", "104", "80", "35", "112", 
"350"), WT2 = c("149", "478", "502", "186", "577", "911"), AGO1 = c("32", 
"147", "433", "51", "258", "353"), AGO2 = c("37", "222", "355", 
"85", "408", "420"), DCL1 = c("56", "185", "291", "48", "167", 
"273"), DCL2 = c("59", "176", "294", "31", "185", "245"), NAs = c(0L, 
0L, 0L, 0L, 0L, 0L)), .Names = c("Sequence", "WT1", "WT2", "AGO1", 
"AGO2", "DCL1", "DCL2", "NAs"), row.names = c(NA, 6L), class = "data.frame")

因此new.dat数据的结果应该有七个csv文件。第一个csv WT1.csv包含SequenceWT1列,第二个csv文件WT2.csv包含SequenceWT2列等等。

这是我尝试过的代码。请在这里建议我缺少的东西。 感谢

for (name in colnames(new.dat[-1])){
   tmp=subset(new.dat$Sequence, colnames==name)
   fn= name
   #Save the CSV file 
   write.csv(tmp,fn,row.names=FALSE)
 }

2 个答案:

答案 0 :(得分:3)

我们可以循环使用lapply的第一个列名称,通过包含“序列”列并将其写入文件

来对数据集的列进行子集化
lapply(names(new.dat)[-1], function(nm) 
   write.csv(new.dat[c("Sequence", nm)], 
       paste0(nm, ".csv"), quote = FALSE, row.names = FALSE)) 

答案 1 :(得分:2)

使用列索引更容易。

for (i in 2:ncol(new.dat)) {
    tmp=new.dat[,c(1,i)]
    name=colnames(new.dat)[i]
    fn = paste0(name,".csv")
   print(fn)
   #Save the CSV file 
   write.csv(tmp,fn,row.names=FALSE)
}