R将多列分成列表

时间:2018-08-07 09:47:35

标签: r list function for-loop dataframe

我试图编写一个函数,以将列分为每个数据帧,同时将前四列和每个样本保留在数据帧中。下面是示例:

df:
Name    RsID    Chr Position    Sample1 Sample2 Sample3 Sample4 Sample5 Sample6 Sample7
200610-1    rs423874    MT  2755    AA  AA  AA  AA  AA  AA  AA
200610-10   rs94753345  MT  0   AA  AA  AA  AA  AA  AA  AA
200610-100  rs36757 MT  15172   GG  GG  GG  GG  GG  GG  GG
200610-102  rs1444029   MT  125 AA  AA  AA  AA  AA  AA  AA
200610-105  rs3796687   MT  236 AA  AA  TT  AA  AA  AT  AA
200610-107  rs483795    MT  482 TT  AA  AA  TT  AA  AA  AA

desired output:
Name    RsID    Chr Position    Sample1
200610-1    rs423874    MT  2755    AA
200610-10   rs94753345  MT  0   AA
200610-100  rs36757 MT  15172   GG
200610-102  rs1444029   MT  125 AA
200610-105  rs3796687   MT  236 AA
200610-107  rs483795    MT  482 TT

Name    RsID    Chr Position    Sample2
200610-1    rs423874    MT  2755    AA
200610-10   rs94753345  MT  0   AA
200610-100  rs36757 MT  15172   GG
200610-102  rs1444029   MT  125 AA
200610-105  rs3796687   MT  236 AA
200610-107  rs483795    MT  482 AA   

...

code:
sep_col <- function(df,i) {if (length(i) <= 1) { x <- cbind(df[1:4],df[i])} 
else { x <- list()
for(s in 1:length(i)) {y <- cbind(df[1:4],df[i[s]])
  x[[s]] <- list(y)}}
return(x)}

如果我在函数内编写df [1:4],它会起作用,但是,如果仅在函数中改回df并运行,则会出现错误:

sep_col(df[1:4],6)

Error:
Error in `[.data.frame`(df, i) : undefined columns selected
Called from: `[.data.frame`(df, i)

我不知道为什么它不正确,但是两个类都是'data.frame',所以任何人都可以提出建议,谢谢。

1 个答案:

答案 0 :(得分:0)

我们可以使用Map将第1:4列与第5到11列分别绑定,并用names为相应列的setNames分配

Map(function(x, y, z) cbind(x, setNames(list(y), z)), 
                   list(df[1:4]), df[5:11], names(df)[5:11])
#[[1]]
#        Name       RsID Chr Position Sample1
#1   200610-1   rs423874  MT     2755      AA
#2  200610-10 rs94753345  MT        0      AA
#3 200610-100    rs36757  MT    15172      GG
#4 200610-102  rs1444029  MT      125      AA
#5 200610-105  rs3796687  MT      236      AA

#[[2]]
#        Name       RsID Chr Position Sample2
#1   200610-1   rs423874  MT     2755      AA
#2  200610-10 rs94753345  MT        0      AA
#3 200610-100    rs36757  MT    15172      GG
#4 200610-102  rs1444029  MT      125      AA
#5 200610-105  rs3796687  MT      236      AA

#[[3]]
#        Name       RsID Chr Position Sample3
#1   200610-1   rs423874  MT     2755      AA
#2  200610-10 rs94753345  MT        0      AA
#3 200610-100    rs36757  MT    15172      GG
#4 200610-102  rs1444029  MT      125      AA
#5 200610-105  rs3796687  MT      236      TT

#[[4]]
#        Name       RsID Chr Position Sample4
#1   200610-1   rs423874  MT     2755      AA
#2  200610-10 rs94753345  MT        0      AA
#3 200610-100    rs36757  MT    15172      GG
#4 200610-102  rs1444029  MT      125      AA
#5 200610-105  rs3796687  MT      236      AA

#[[5]]
#        Name       RsID Chr Position Sample5
#1   200610-1   rs423874  MT     2755      AA
#2  200610-10 rs94753345  MT        0      AA
#3 200610-100    rs36757  MT    15172      GG
#4 200610-102  rs1444029  MT      125      AA
#5 200610-105  rs3796687  MT      236      AA

#[[6]]
#        Name       RsID Chr Position Sample6
#1   200610-1   rs423874  MT     2755      AA
#2  200610-10 rs94753345  MT        0      AA
#3 200610-100    rs36757  MT    15172      GG
#4 200610-102  rs1444029  MT      125      AA
#5 200610-105  rs3796687  MT      236      AT

#[[7]]
#        Name       RsID Chr Position Sample7
#1   200610-1   rs423874  MT     2755      AA
#2  200610-10 rs94753345  MT        0      AA
#3 200610-100    rs36757  MT    15172      GG
#4 200610-102  rs1444029  MT      125      AA
#5 200610-105  rs3796687  MT      236      AA

或者使用lapply,在列名5到11之间循环,将数据集作为该列的子集,将cbind与数据集的前4列进行子集

lapply(names(df)[5:11], function(x) cbind(df[1:4], df[x]))