我试图编写一个函数,以将列分为每个数据帧,同时将前四列和每个样本保留在数据帧中。下面是示例:
df:
Name RsID Chr Position Sample1 Sample2 Sample3 Sample4 Sample5 Sample6 Sample7
200610-1 rs423874 MT 2755 AA AA AA AA AA AA AA
200610-10 rs94753345 MT 0 AA AA AA AA AA AA AA
200610-100 rs36757 MT 15172 GG GG GG GG GG GG GG
200610-102 rs1444029 MT 125 AA AA AA AA AA AA AA
200610-105 rs3796687 MT 236 AA AA TT AA AA AT AA
200610-107 rs483795 MT 482 TT AA AA TT AA AA AA
desired output:
Name RsID Chr Position Sample1
200610-1 rs423874 MT 2755 AA
200610-10 rs94753345 MT 0 AA
200610-100 rs36757 MT 15172 GG
200610-102 rs1444029 MT 125 AA
200610-105 rs3796687 MT 236 AA
200610-107 rs483795 MT 482 TT
Name RsID Chr Position Sample2
200610-1 rs423874 MT 2755 AA
200610-10 rs94753345 MT 0 AA
200610-100 rs36757 MT 15172 GG
200610-102 rs1444029 MT 125 AA
200610-105 rs3796687 MT 236 AA
200610-107 rs483795 MT 482 AA
...
code:
sep_col <- function(df,i) {if (length(i) <= 1) { x <- cbind(df[1:4],df[i])}
else { x <- list()
for(s in 1:length(i)) {y <- cbind(df[1:4],df[i[s]])
x[[s]] <- list(y)}}
return(x)}
如果我在函数内编写df [1:4],它会起作用,但是,如果仅在函数中改回df并运行,则会出现错误:
sep_col(df[1:4],6)
Error:
Error in `[.data.frame`(df, i) : undefined columns selected
Called from: `[.data.frame`(df, i)
我不知道为什么它不正确,但是两个类都是'data.frame',所以任何人都可以提出建议,谢谢。
答案 0 :(得分:0)
我们可以使用Map
将第1:4列与第5到11列分别绑定,并用names
为相应列的setNames
分配
Map(function(x, y, z) cbind(x, setNames(list(y), z)),
list(df[1:4]), df[5:11], names(df)[5:11])
#[[1]]
# Name RsID Chr Position Sample1
#1 200610-1 rs423874 MT 2755 AA
#2 200610-10 rs94753345 MT 0 AA
#3 200610-100 rs36757 MT 15172 GG
#4 200610-102 rs1444029 MT 125 AA
#5 200610-105 rs3796687 MT 236 AA
#[[2]]
# Name RsID Chr Position Sample2
#1 200610-1 rs423874 MT 2755 AA
#2 200610-10 rs94753345 MT 0 AA
#3 200610-100 rs36757 MT 15172 GG
#4 200610-102 rs1444029 MT 125 AA
#5 200610-105 rs3796687 MT 236 AA
#[[3]]
# Name RsID Chr Position Sample3
#1 200610-1 rs423874 MT 2755 AA
#2 200610-10 rs94753345 MT 0 AA
#3 200610-100 rs36757 MT 15172 GG
#4 200610-102 rs1444029 MT 125 AA
#5 200610-105 rs3796687 MT 236 TT
#[[4]]
# Name RsID Chr Position Sample4
#1 200610-1 rs423874 MT 2755 AA
#2 200610-10 rs94753345 MT 0 AA
#3 200610-100 rs36757 MT 15172 GG
#4 200610-102 rs1444029 MT 125 AA
#5 200610-105 rs3796687 MT 236 AA
#[[5]]
# Name RsID Chr Position Sample5
#1 200610-1 rs423874 MT 2755 AA
#2 200610-10 rs94753345 MT 0 AA
#3 200610-100 rs36757 MT 15172 GG
#4 200610-102 rs1444029 MT 125 AA
#5 200610-105 rs3796687 MT 236 AA
#[[6]]
# Name RsID Chr Position Sample6
#1 200610-1 rs423874 MT 2755 AA
#2 200610-10 rs94753345 MT 0 AA
#3 200610-100 rs36757 MT 15172 GG
#4 200610-102 rs1444029 MT 125 AA
#5 200610-105 rs3796687 MT 236 AT
#[[7]]
# Name RsID Chr Position Sample7
#1 200610-1 rs423874 MT 2755 AA
#2 200610-10 rs94753345 MT 0 AA
#3 200610-100 rs36757 MT 15172 GG
#4 200610-102 rs1444029 MT 125 AA
#5 200610-105 rs3796687 MT 236 AA
或者使用lapply
,在列名5到11之间循环,将数据集作为该列的子集,将cbind
与数据集的前4列进行子集
lapply(names(df)[5:11], function(x) cbind(df[1:4], df[x]))