lcl | NZ_AP012542.1_cds_WP_003600377.1_1 [locus_tag = LBPC_RS14705] [db_xref = GeneID:31583580] [protein = RepB家族质粒复制起始蛋白] [protein_id = WP_003600377.1] [location = 1..780] [gbkey = CDS] ATGGCAAATACAATCAACAAAAAACAAAATCTGGCGATGCAGGCGTTGCTTAAACGCCAAGACTATCTTG
lcl | NZ_AP012542.1_cds_WP_016377574.1_2 [locus_tag = LBPC_RS14710] [db_xref = GeneID:31583581] [蛋白质=含DUF536域的蛋白质] [蛋白质_id = WP_016377574.1] [位置=补体(1459.1956)] [ gbkey = CDS] ATGAGTAAGACCATCAAAGAACTTGCAGAGGAATTGAGCTTATCTAAATCTGGTATTCGTAAATATCTAA
我想在locus_tag =之后提取单词(仅LBPC_RS14705和LBPC_RS14710)。如何修复此正则表达式?
[locus_tag] [=] \ w +
答案 0 :(得分:1)
您可以使用以下正则表达式来匹配locus_tag
:
/\[locus_tag=(\w+)]/g;
在此表达式中,我捕获了“ locus_tag =“之后的单词字符,因此您可以通过执行两次.exec(str)[1]
来获取两个标签来访问它。
请参见下面的工作示例:
const str =
`lcl|NZ_AP012542.1_cds_WP_003600377.1_1 [locus_tag=LBPC_RS1477705] [db_xref=GeneID:31583580] [protein=RepB family plasmid replication initiator protein] [protein_id=WP_003600377.1] [location=1..780] [gbkey=CDS] ATGGCAAATACAATCAACAAAAAACAAAATCTGGCGATGCAGGCGTTGCTTAAACGCCAAGACTATCTTG
lcl|NZ_AP012542.1_cds_WP_016377574.1_2 [locus_tag=LBPC_RS14710] [db_xref=GeneID:31583581] [protein=DUF536 domain-containing protein] [protein_id=WP_016377574.1] [location=complement(1459..1956)] [gbkey=CDS] ATGAGTAAGACCATCAAAGAACTTGCAGAGGAATTGAGCTTATCTAAATCTGGTATTCGTAAATATCTAA`;
const regex = /\[locus_tag=(\w+)]/g;
console.log(regex.exec(str)[1]); // Run exec once to get the first match
console.log(regex.exec(str)[1]); // Run exec twice to get the second match
答案 1 :(得分:0)
您还可以尝试以下任何一种方法。
在这里,我假设您的轨迹标签包含文字字符,如我所见。
\w+
与之匹配。
var s1 = "lcl|NZ_AP012542.1_cds_WP_003600377.1_1 [locus_tag=LBPC_RS14705] [db_xref=GeneID:31583580] [protein=RepB family plasmid replication initiator protein] [protein_id=WP_003600377.1] [location=1..780] [gbkey=CDS] ATGGCAAATACAATCAACAAAAAACAAAATCTGGCGATGCAGGCGTTGCTTAAACGCCAAGACTATCTTG";
var s2 = "lcl|NZ_AP012542.1_cds_WP_016377574.1_2 [locus_tag=LBPC_RS14710] [db_xref=GeneID:31583581] [protein=DUF536 domain-containing protein] [protein_id=WP_016377574.1] [location=complement(1459..1956)] [gbkey=CDS] ATGAGTAAGACCATCAAAGAACTTGCAGAGGAATTGAGCTTATCTAAATCTGGTATTCGTAAATATCTAA";
const regEx = /(locus_tag=(\w+))/;
var locus_tag1 = s1.match(regEx)[2];
var locus_tag2 = s2.match(regEx)[2];
console.log(locus_tag1); // LBPC_RS14705
console.log(locus_tag2); // LBPC_RS14710
var s1 = "lcl|NZ_AP012542.1_cds_WP_003600377.1_1 [locus_tag=LBPC_RS14705] [db_xref=GeneID:31583580] [protein=RepB family plasmid replication initiator protein] [protein_id=WP_003600377.1] [location=1..780] [gbkey=CDS] ATGGCAAATACAATCAACAAAAAACAAAATCTGGCGATGCAGGCGTTGCTTAAACGCCAAGACTATCTTG";
var s2 = "lcl|NZ_AP012542.1_cds_WP_016377574.1_2 [locus_tag=LBPC_RS14710] [db_xref=GeneID:31583581] [protein=DUF536 domain-containing protein] [protein_id=WP_016377574.1] [location=complement(1459..1956)] [gbkey=CDS] ATGAGTAAGACCATCAAAGAACTTGCAGAGGAATTGAGCTTATCTAAATCTGGTATTCGTAAATATCTAA";
const regEx = /(locus_tag=\w+)/;
var locus_tag1 = s1.match(regEx)[0].split('=')[1];
var locus_tag2 = s2.match(regEx)[0].split('=')[1];
console.log(locus_tag1); // LBPC_RS14705
console.log(locus_tag2); // LBPC_RS14710