将一些行转换为R中的列

时间:2014-11-13 03:42:22

标签: regex r row

我有一个包含1列的表格,我想在行中的其他元素中提取一个并插入到新列中 我可以说我的桌子:df

      V1
elements-of-01-to-20
ACTCTGCGACHCHAHAATT
elements-of-21-to-30
ACTAGCTATTATCGATATT
elements-of-31-to-40
CCCTTATATTGGAGCTACT

我想要的结果:

       V1                   V2
elements-of-01-to-20   ACTCTGCGACHCHAHAATT
elements-of-21-to-20   ACTAGCTATTATCGATATT
elements-of-31-to-20   CCCTTATATTGGAGCTACT
elements-of-31-to-40   CCCTTATATTGGAGCTACT

编辑
感谢所有回复。我的第二个问题是,如果我的数据集有多个序列,后跟特定术语,如elements-of

   V1               => result =>         V1                   v2
elements-of-01-to-20            elements-of-01-to-20  ACTCTGCGACHCHAHAATTAGGGGATGCTGATTTAGTA
ACTCTGCGACHCHAHAATT             elements-of-21-to-30  ACTAGCTATTATCGATATT
AGGGGATGCTGATTTAGTA
elements-of-21-to-30
ACTAGCTATTATCGATATT

4 个答案:

答案 0 :(得分:3)

如果模式与示例中的模式相同

 indx <- c(TRUE, FALSE)
 data.frame(V1=df$V1[indx], V2=df$V1[!indx])
 #                  V1                  V2
 #1 elements-of-01-to-20 ACTCTGCGACHCHAHAATT
 #2 elements-of-21-to-30 ACTAGCTATTATCGATATT
 #3 elements-of-31-to-40 CCCTTATATTGGAGCTACT

更新

基于更新的数据集

 library(data.table)
 setDT(df)[,list(V1=V1[1], V2=paste(V1[-1], collapse='')),
        by= list(indx=cumsum(grepl('^[^A-Z]', df$V1)))][, indx:=NULL][]

 #                   V1                                     V2
 #1: elements-of-01-to-20 ACTCTGCGACHCHAHAATTAGGGGATGCTGATTTAGTA
 #2: elements-of-21-to-30                    ACTAGCTATTATCGATATT

新数据

 df <- structure(list(V1 = c("elements-of-01-to-20", "ACTCTGCGACHCHAHAATT", 
 "AGGGGATGCTGATTTAGTA", "elements-of-21-to-30", "ACTAGCTATTATCGATATT"
 )), .Names = "V1", class = "data.frame", row.names = c(NA, -5L))

答案 1 :(得分:2)

如果那只是一个fasta文件,那么请查看Biostrings包。你也可以这样做

MySeq <- data.frame("Name" = df$V1[(seq(1, length(df$V1), by=2)],
"Seq" = df$V1[(seq(2, length(df$V1), by=2)],
stringsAsFactors = FALSE)

答案 2 :(得分:2)

以下是使用grepl的另一种方式:

#dummy data
df <- read.table(text="      V1
elements-of-01-to-20
ACTCTGCGACHCHAHAATT
elements-of-21-to-30
ACTAGCTATTATCGATATT
elements-of-31-to-40
CCCTTATATTGGAGCTACT",
                 as.is=TRUE,header=TRUE)
#result
cbind(df[ grepl("elements",df$V1), "V1"],
      df[ !grepl("elements",df$V1), "V1"])

#output
#      [,1]                   [,2]                 
# [1,] "elements-of-01-to-20" "ACTCTGCGACHCHAHAATT"
# [2,] "elements-of-21-to-30" "ACTAGCTATTATCGATATT"
# [3,] "elements-of-31-to-40" "CCCTTATATTGGAGCTACT"

答案 3 :(得分:1)

尝试(使用传统的编程方法):

ndf = data.frame(V1="", V2="", stringsAsFactors=FALSE)
i=1
while(i<nrow(df)){
    ndf[(nrow(ndf)+1),]=c(df[i,1], df[(i+1),1])
    i=i+2
}
ndf[-1,]
                    V1                  V2
2 elements-of-01-to-20 ACTCTGCGACHCHAHAATT
3 elements-of-21-to-30 ACTAGCTATTATCGATATT
4 elements-of-31-to-40 CCCTTATATTGGAGCTACT