正则表达式不适用于多个模式出现

时间:2015-06-18 20:26:26

标签: python regex

我想抓住每个第一次出现的字符串,然后是"genome_",但在",("之前结束,并用特定的字符串替换它,比如"XXX"

在下面的文字中:

  

(ID_Bxylanisolvens_NLAE-ZL-C182_genome_orf00003 ____ Bxylanisolvens_NLAE -.._ 843_unknown ___ 1278-2120_1 _ ^^的 neighbours_ID_Bxylanisolvens_NLAE-ZL-C182_genome_orf00002_1__ID_Bxylanisolvens_NLAE-ZL-C182_genome_orf00004_1__neighbour_genes_Bxylanisolvens_NLAE -.._ Bxylanisolvens_NLAE - .. :0.00000230914009336068,((ID_Bxylanisolvens_NLAE-ZL-G421_genome_orf00003 ____ Bxylanisolvens_NLAE -.._ 843_unknown ___ 1315-2157_1 _ ^^的 neighbours_ID_Bxylanisolvens_NLAE-ZL-G421_genome_orf00002_1__ID_Bxylanisolvens_NLAE-ZL-G421_genome_orf00004_1__neighbour_genes_Bxylanisolvens_NLAE -.._ Bxylanisolvens_NLAE - .. :0.00000230914009336068,ID_Bxylanisolvens_NLAE-ZL-C339_genome_orf00003 ____ Bxylanisolvens_NLAE -.._ 843_unknown ___ 1084-1926_1 _ ^^的 neighbours_ID_Bxylanisolvens_NLAE-ZL-C339_genome_orf00002_1__ID_Bxylanisolvens_NLAE-ZL-C339_genome_orf00004_1__neighbour_genes_Bxylanisolvens_NLAE -.._ Bxylanisolvens_NLAE - .. :0.00000230914009336068)28:0.00000230914009336068,(

期望的结果:

  

(ID_Bxylanisolvens_NLAE-ZL-C182_XXX,((ID_Bxylanisolvens_NLAE-ZL-G421_XXX,(

1 个答案:

答案 0 :(得分:1)

根据您的样本数据和所需的输出,正向观察应该有所帮助:

(?<=ID_Bxylanisolvens_NLAE-zl-[A-Z]\d{3,3}_)(genome.*?)(?=,\()
  • (?<=ID_Bxylanisolvens_NLAE-zl-[A-Z]\d{3,3}_)回顾并检查特定的字符序列。可能需要根据实际数据的可变性进行调整。
  • (genome.*?)抓住了要替换的位置 - 用问号使其变得非贪婪。
  • (?=,\()期待字符组合来划分要删除的部分。

查看实际操作:RegEx101 如果需要进一步的细节/调整,请发表评论。