在R

时间:2017-04-25 19:28:49

标签: r

我有一个包含713列和10行的大数据框,我想从第6列开始连接每3列,变量名从v1到v713。

数据如下所示:

> chr1[,1:10]
  V1  V2   V3  V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14........
1  1 rs1  116  T  G  1  0  0  0   1   0   0   1   0 
2  1 rs2  118  G  A  1  0  0  1   0   0   0   1   0  
3  1 rs3  230  A  G  1  0  0  1   0   0   0   1   0  

需要的结果:

  V1  V2   V3  V4 V5  V6  V7   V8..........
1  1 rs1  116  T  G  100  010  010
2  1 rs2  118  G  A  100  100  010
3  1 rs3  230  A  G  100  100  010

我怎样才能在R?

中这样做

谢谢!

1 个答案:

答案 0 :(得分:2)

假设要连接的列从第6个位置开始,我们使用由split创建的分组变量将其作为不同的对象('df2'),gl子集到每三个列, paste行的元素do.call(paste0通过循环list data.framecbind前5列并更新列名

df2 <- df1[6:ncol(df1)]
dfN <- cbind(df1[1:5], sapply(split.default(df2, as.integer(gl(ncol(df2),
             3, ncol(df2)))), function(x) do.call(paste0, x)))
colnames(dfN) <- paste0("V", seq_along(dfN))
dfN
#  V1  V2  V3 V4 V5  V6  V7  V8
#1  1 rs1 116  T  G 100 010 010
#2  1 rs2 118  G  A 100 100 010
#3  1 rs3 230  A  G 100 100 010

或另一个选项是tidyverse我们将列'V6'与最后一列连接到单个列'VNew'和unite,然后separate将它连接到多个列sep参数,其中也包含数字位置

library(tidyverse)
df1 %>% 
    unite(VNew, V6:V14, sep="") %>%
    separate(VNew, into = c("V6", "V7", "V8"), sep=c(3, 6))
#  V1  V2  V3 V4 V5  V6  V7  V8
#1  1 rs1 116  T  G 100 010 010
#2  1 rs2 118  G  A 100 100 010
#3  1 rs3 230  A  G 100 100 010

数据

df1 <- structure(list(V1 = c(1L, 1L, 1L), V2 = c("rs1", "rs2", "rs3"
), V3 = c(116L, 118L, 230L), V4 = c("T", "G", "A"), V5 = c("G", 
"A", "G"), V6 = c(1L, 1L, 1L), V7 = c(0L, 0L, 0L), V8 = c(0L, 
 0L, 0L), V9 = c(0L, 1L, 1L), V10 = c(1L, 0L, 0L), V11 = c(0L, 
 0L, 0L), V12 = c(0L, 0L, 0L), V13 = c(1L, 1L, 1L), V14 = c(0L, 
 0L, 0L)), .Names = c("V1", "V2", "V3", "V4", "V5", "V6", "V7", 
 "V8", "V9", "V10", "V11", "V12", "V13", "V14"), class = "data.frame", 
 row.names = c("1", "2", "3"))