我有这个字符串向量。我想将它们按 |
拆分并将字段 2、3、4 和 10 提取到四个不同的列中。我可以用 unlist(strsplit(test,split='|',fixed=TRUE))[c(2:4,10)]
拆分第一个字符串(test [1]),但我不确定如何处理向量中的所有字符串。任何帮助将不胜感激。
test <- c("PR;ANN=T|splice_region_variant&intron_variant|LOW|PER3|ENSG00000049246|transcript|ENST00000377532|protein_coding|13/20|c.1658+7G>T||||||,T|splice_region_variant&intron_variant|LOW|PER3|ENSG00000049246|transcript|ENST00000613533|protein_coding|14/21|c.1658+7G>T||||||,T|splice_region_variant&intron_variant|LOW|PER3|ENSG00000049246|transcript|ENST00000614998|protein_coding|14/22|c.1658+7G>T||||||,T|splice_region_variant&intron_variant|LOW|PER3|ENSG00000049246|transcript|ENST00000361923|protein_coding|13/20|c.1634+7G>T||||||,T|intron_variant|MODIFIER|RP3-467L1.4|ENSG00000236266|transcript|ENST00000451646|antisense|1/2|n.239+7677C>A||||||;AC=64;AC_AFR=1;AC_AMR=0;AC_Adj=64;AC_EAS=0;AC_FIN=0;AC_Het=64;AC_Hom=0;AC_NFE=63;AC_OTH=0;AC_SAS=0;AF=5.271e-04;AN=121410;AN_AFR=10404;AN_AMR=11578;AN_Adj=121084;AN_EAS=8652;AN_FIN=6614;AN_NFE=66616;AN_OTH=906;AN_SAS=16314;CSQ=A|ENSG00000236266|ENST00000451646|Transcript|intron_variant&non_coding_transcript_variant||||||rs200733001|2||-1|RP3-467L1.4|Clo... <truncated>
"PR;ANN=G|intron_variant|MODIFIER|PIGK|ENSG00000142892|transcript|ENST00000370812|protein_coding|10/10|c.1072-59T>C||||||,G|intron_variant|MODIFIER|PIGK|ENSG00000142892|transcript|ENST00000445065|protein_coding|7/7|c.790-59T>C||||||,G|intron_variant|MODIFIER|PIGK|ENSG00000142892|transcript|ENST00000487906|nonsense_mediated_decay|6/6|n.*561-59T>C||||||"
)
答案 0 :(得分:2)
如果有多个元素,循环遍历用 list
创建的 strsplit
,提取 ([
)、元素和 rbind
到 {{ 1}}
matrix
答案 1 :(得分:1)
试试这个 lapply
表达式:
lapply(strsplit(test, '\\|'), function(x) x[c(2:4,10)])
[[1]]
[1] "splice_region_variant&intron_variant" "LOW"
[3] "PER3" "c.1658+7G>T"
[[2]]
[1] "intron_variant" "MODIFIER" "PIGK" "c.1072-59T>C"
或者,使用 sapply
:
sapply(strsplit(test, '\\|'), function(x) x[c(2:4,10)])
[,1] [,2]
[1,] "splice_region_variant&intron_variant" "intron_variant"
[2,] "LOW" "MODIFIER"
[3,] "PER3" "PIGK"
[4,] "c.1658+7G>T" "c.1072-59T>C"
数据:
test <- c("PR;ANN=T|splice_region_variant&intron_variant|LOW|PER3|ENSG00000049246|transcript|ENST00000377532|protein_coding|13/20|c.1658+7G>T||||||,T|splice_region_variant&intron_variant|LOW|PER3|ENSG00000049246|transcript|ENST00000613533|protein_coding|14/21|c.1658+7G>T||||||,T|splice_region_variant&intron_variant|LOW|PER3|ENSG00000049246|transcript|ENST00000614998|protein_coding|14/22|c.1658+7G>T||||||,T|splice_region_variant&intron_variant|LOW|PER3|ENSG00000049246|transcript|ENST00000361923|protein_coding|13/20|c.1634+7G>T||||||,T|intron_variant|MODIFIER|RP3-467L1.4|ENSG00000236266|transcript|ENST00000451646|antisense|1/2|n.239+7677C>A||||||;AC=64;AC_AFR=1;AC_AMR=0;AC_Adj=64;AC_EAS=0;AC_FIN=0;AC_Het=64;AC_Hom=0;AC_NFE=63;AC_OTH=0;AC_SAS=0;AF=5.271e-04;AN=121410;AN_AFR=10404;AN_AMR=11578;AN_Adj=121084;AN_EAS=8652;AN_FIN=6614;AN_NFE=66616;AN_OTH=906;AN_SAS=16314;CSQ=A|ENSG00000236266|ENST00000451646|Transcript|intron_variant&non_coding_transcript_variant||||||rs200733001|2||-1|RP3-467L1.4|Clo... <truncated>",
"PR;ANN=G|intron_variant|MODIFIER|PIGK|ENSG00000142892|transcript|ENST00000370812|protein_coding|10/10|c.1072-59T>C||||||,G|intron_variant|MODIFIER|PIGK|ENSG00000142892|transcript|ENST00000445065|protein_coding|7/7|c.790-59T>C||||||,G|intron_variant|MODIFIER|PIGK|ENSG00000142892|transcript|ENST00000487906|nonsense_mediated_decay|6/6|n.*561-59T>C||||||")