我有一个包含1列的表格,我想在行中的其他元素中提取一个并插入到新列中
我可以说我的桌子:df
V1
elements-of-01-to-20
ACTCTGCGACHCHAHAATT
elements-of-21-to-30
ACTAGCTATTATCGATATT
elements-of-31-to-40
CCCTTATATTGGAGCTACT
我想要的结果:
V1 V2
elements-of-01-to-20 ACTCTGCGACHCHAHAATT
elements-of-21-to-20 ACTAGCTATTATCGATATT
elements-of-31-to-20 CCCTTATATTGGAGCTACT
elements-of-31-to-40 CCCTTATATTGGAGCTACT
编辑 :
感谢所有回复。我的第二个问题是,如果我的数据集有多个序列,后跟特定术语,如elements-of
:
V1 => result => V1 v2
elements-of-01-to-20 elements-of-01-to-20 ACTCTGCGACHCHAHAATTAGGGGATGCTGATTTAGTA
ACTCTGCGACHCHAHAATT elements-of-21-to-30 ACTAGCTATTATCGATATT
AGGGGATGCTGATTTAGTA
elements-of-21-to-30
ACTAGCTATTATCGATATT
答案 0 :(得分:3)
如果模式与示例中的模式相同
indx <- c(TRUE, FALSE)
data.frame(V1=df$V1[indx], V2=df$V1[!indx])
# V1 V2
#1 elements-of-01-to-20 ACTCTGCGACHCHAHAATT
#2 elements-of-21-to-30 ACTAGCTATTATCGATATT
#3 elements-of-31-to-40 CCCTTATATTGGAGCTACT
基于更新的数据集
library(data.table)
setDT(df)[,list(V1=V1[1], V2=paste(V1[-1], collapse='')),
by= list(indx=cumsum(grepl('^[^A-Z]', df$V1)))][, indx:=NULL][]
# V1 V2
#1: elements-of-01-to-20 ACTCTGCGACHCHAHAATTAGGGGATGCTGATTTAGTA
#2: elements-of-21-to-30 ACTAGCTATTATCGATATT
df <- structure(list(V1 = c("elements-of-01-to-20", "ACTCTGCGACHCHAHAATT",
"AGGGGATGCTGATTTAGTA", "elements-of-21-to-30", "ACTAGCTATTATCGATATT"
)), .Names = "V1", class = "data.frame", row.names = c(NA, -5L))
答案 1 :(得分:2)
如果那只是一个fasta文件,那么请查看Biostrings包。你也可以这样做
MySeq <- data.frame("Name" = df$V1[(seq(1, length(df$V1), by=2)],
"Seq" = df$V1[(seq(2, length(df$V1), by=2)],
stringsAsFactors = FALSE)
答案 2 :(得分:2)
以下是使用grepl
的另一种方式:
#dummy data
df <- read.table(text=" V1
elements-of-01-to-20
ACTCTGCGACHCHAHAATT
elements-of-21-to-30
ACTAGCTATTATCGATATT
elements-of-31-to-40
CCCTTATATTGGAGCTACT",
as.is=TRUE,header=TRUE)
#result
cbind(df[ grepl("elements",df$V1), "V1"],
df[ !grepl("elements",df$V1), "V1"])
#output
# [,1] [,2]
# [1,] "elements-of-01-to-20" "ACTCTGCGACHCHAHAATT"
# [2,] "elements-of-21-to-30" "ACTAGCTATTATCGATATT"
# [3,] "elements-of-31-to-40" "CCCTTATATTGGAGCTACT"
答案 3 :(得分:1)
尝试(使用传统的编程方法):
ndf = data.frame(V1="", V2="", stringsAsFactors=FALSE)
i=1
while(i<nrow(df)){
ndf[(nrow(ndf)+1),]=c(df[i,1], df[(i+1),1])
i=i+2
}
ndf[-1,]
V1 V2
2 elements-of-01-to-20 ACTCTGCGACHCHAHAATT
3 elements-of-21-to-30 ACTAGCTATTATCGATATT
4 elements-of-31-to-40 CCCTTATATTGGAGCTACT