我有一个包含713列和10行的大数据框,我想从第6列开始连接每3列,变量名从v1到v713。
数据如下所示:
> chr1[,1:10]
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14........
1 1 rs1 116 T G 1 0 0 0 1 0 0 1 0
2 1 rs2 118 G A 1 0 0 1 0 0 0 1 0
3 1 rs3 230 A G 1 0 0 1 0 0 0 1 0
需要的结果:
V1 V2 V3 V4 V5 V6 V7 V8..........
1 1 rs1 116 T G 100 010 010
2 1 rs2 118 G A 100 100 010
3 1 rs3 230 A G 100 100 010
我怎样才能在R?
中这样做谢谢!
答案 0 :(得分:2)
假设要连接的列从第6个位置开始,我们使用由split
创建的分组变量将其作为不同的对象('df2'),gl
子集到每三个列, paste
行的元素do.call(paste0
通过循环list
data.frame
,cbind
前5列并更新列名{}} p>
df2 <- df1[6:ncol(df1)]
dfN <- cbind(df1[1:5], sapply(split.default(df2, as.integer(gl(ncol(df2),
3, ncol(df2)))), function(x) do.call(paste0, x)))
colnames(dfN) <- paste0("V", seq_along(dfN))
dfN
# V1 V2 V3 V4 V5 V6 V7 V8
#1 1 rs1 116 T G 100 010 010
#2 1 rs2 118 G A 100 100 010
#3 1 rs3 230 A G 100 100 010
或另一个选项是tidyverse
我们将列'V6'与最后一列连接到单个列'VNew'和unite
,然后separate
将它连接到多个列sep
参数,其中也包含数字位置
library(tidyverse)
df1 %>%
unite(VNew, V6:V14, sep="") %>%
separate(VNew, into = c("V6", "V7", "V8"), sep=c(3, 6))
# V1 V2 V3 V4 V5 V6 V7 V8
#1 1 rs1 116 T G 100 010 010
#2 1 rs2 118 G A 100 100 010
#3 1 rs3 230 A G 100 100 010
df1 <- structure(list(V1 = c(1L, 1L, 1L), V2 = c("rs1", "rs2", "rs3"
), V3 = c(116L, 118L, 230L), V4 = c("T", "G", "A"), V5 = c("G",
"A", "G"), V6 = c(1L, 1L, 1L), V7 = c(0L, 0L, 0L), V8 = c(0L,
0L, 0L), V9 = c(0L, 1L, 1L), V10 = c(1L, 0L, 0L), V11 = c(0L,
0L, 0L), V12 = c(0L, 0L, 0L), V13 = c(1L, 1L, 1L), V14 = c(0L,
0L, 0L)), .Names = c("V1", "V2", "V3", "V4", "V5", "V6", "V7",
"V8", "V9", "V10", "V11", "V12", "V13", "V14"), class = "data.frame",
row.names = c("1", "2", "3"))