如何修复/编辑此正则表达式?

时间:2018-11-10 07:51:14

标签: javascript regex

  

lcl | NZ_AP012542.1_cds_WP_003600377.1_1 [locus_tag = LBPC_RS14705] [db_xref = GeneID:31583580] [protein = RepB家族质粒复制起始蛋白] [protein_id = WP_003600377.1] [location = 1..780] [gbkey = CDS]   ATGGCAAATACAATCAACAAAAAACAAAATCTGGCGATGCAGGCGTTGCTTAAACGCCAAGACTATCTTG

     

lcl | NZ_AP012542.1_cds_WP_016377574.1_2 [locus_tag = LBPC_RS14710] [db_xref = GeneID:31583581] [蛋白质=含DUF536域的蛋白质] [蛋白质_id = WP_016377574.1] [位置=补体(1459.1956)] [ gbkey = CDS]   ATGAGTAAGACCATCAAAGAACTTGCAGAGGAATTGAGCTTATCTAAATCTGGTATTCGTAAATATCTAA

我想在locus_tag =之后提取单词(仅LBPC_RS14705和LBPC_RS14710)。如何修复此正则表达式?

[locus_tag] [=] \ w +

2 个答案:

答案 0 :(得分:1)

您可以使用以下正则表达式来匹配locus_tag

/\[locus_tag=(\w+)]/g;

在此表达式中,我捕获了“ locus_tag =“之后的单词字符,因此您可以通过执行两次.exec(str)[1]来获取两个标签来访问它。

请参见下面的工作示例:

const str = 
`lcl|NZ_AP012542.1_cds_WP_003600377.1_1 [locus_tag=LBPC_RS1477705] [db_xref=GeneID:31583580] [protein=RepB family plasmid replication initiator protein] [protein_id=WP_003600377.1] [location=1..780] [gbkey=CDS] ATGGCAAATACAATCAACAAAAAACAAAATCTGGCGATGCAGGCGTTGCTTAAACGCCAAGACTATCTTG

lcl|NZ_AP012542.1_cds_WP_016377574.1_2 [locus_tag=LBPC_RS14710] [db_xref=GeneID:31583581] [protein=DUF536 domain-containing protein] [protein_id=WP_016377574.1] [location=complement(1459..1956)] [gbkey=CDS] ATGAGTAAGACCATCAAAGAACTTGCAGAGGAATTGAGCTTATCTAAATCTGGTATTCGTAAATATCTAA`;

const regex = /\[locus_tag=(\w+)]/g;
console.log(regex.exec(str)[1]); // Run exec once to get the first match
console.log(regex.exec(str)[1]); // Run exec twice to get the second match

答案 1 :(得分:0)

您还可以尝试以下任何一种方法。

  

在这里,我假设您的轨迹标签包含文字字符,如我所见。 \w+与之匹配。

     

有用的链接:https://javascript.info/regexp-groups

第一种方式

var s1 = "lcl|NZ_AP012542.1_cds_WP_003600377.1_1 [locus_tag=LBPC_RS14705] [db_xref=GeneID:31583580] [protein=RepB family plasmid replication initiator protein] [protein_id=WP_003600377.1] [location=1..780] [gbkey=CDS] ATGGCAAATACAATCAACAAAAAACAAAATCTGGCGATGCAGGCGTTGCTTAAACGCCAAGACTATCTTG";

var s2 = "lcl|NZ_AP012542.1_cds_WP_016377574.1_2 [locus_tag=LBPC_RS14710] [db_xref=GeneID:31583581] [protein=DUF536 domain-containing protein] [protein_id=WP_016377574.1] [location=complement(1459..1956)] [gbkey=CDS] ATGAGTAAGACCATCAAAGAACTTGCAGAGGAATTGAGCTTATCTAAATCTGGTATTCGTAAATATCTAA";

const regEx = /(locus_tag=(\w+))/;

var locus_tag1 = s1.match(regEx)[2];
var locus_tag2 = s2.match(regEx)[2];

console.log(locus_tag1); // LBPC_RS14705
console.log(locus_tag2); // LBPC_RS14710

第二种方式

var s1 = "lcl|NZ_AP012542.1_cds_WP_003600377.1_1 [locus_tag=LBPC_RS14705] [db_xref=GeneID:31583580] [protein=RepB family plasmid replication initiator protein] [protein_id=WP_003600377.1] [location=1..780] [gbkey=CDS] ATGGCAAATACAATCAACAAAAAACAAAATCTGGCGATGCAGGCGTTGCTTAAACGCCAAGACTATCTTG";

var s2 = "lcl|NZ_AP012542.1_cds_WP_016377574.1_2 [locus_tag=LBPC_RS14710] [db_xref=GeneID:31583581] [protein=DUF536 domain-containing protein] [protein_id=WP_016377574.1] [location=complement(1459..1956)] [gbkey=CDS] ATGAGTAAGACCATCAAAGAACTTGCAGAGGAATTGAGCTTATCTAAATCTGGTATTCGTAAATATCTAA";

const regEx = /(locus_tag=\w+)/;

var locus_tag1 = s1.match(regEx)[0].split('=')[1];
var locus_tag2 = s2.match(regEx)[0].split('=')[1];

console.log(locus_tag1); // LBPC_RS14705
console.log(locus_tag2); // LBPC_RS14710