Java中的正则表达式(重复边界模式)

时间:2012-02-09 00:21:32

标签: java regex string

请找一个我的字符串示例:

<s id="1">Here we show that <ANAPH id="535" biotype="partof_product">the approximately 600-amino acid; region</ANAPH> something somethingelse .</s>

所需的功能是通过移除尖括号括起的序列(包括尖括号)来清除字符串。因此,对于上面的示例字符串,所需的输出将是:

Here we show that the approximately 600-amino acid; region something somethingelse .

对于正则表达式= \&lt; {1}。* \&gt; {1}并使用replaceAll函数,整行将被替换;我理解为什么会这样。有人能指出一种更具体地使用正则表达式来表达模式的方法,以获得所需的输出吗?

谢谢。


EDIT1:

是的,使用Kassym Dorsel建议的正则表达式来处理上面的字符串

但是,对于以下字符串:

<s id="7"><ANAPH id="100216" biotype="supertype" assoc_ante="48275" assoc_rel="set-member" coref_chain="set_234">The C. elegans genome sequence</ANAPH> was completed two years ago [ 1 ] , and both the Drosophila [ 2 ] and human genomes are essentially completely sequenced at this point .</s>

使用正则表达式的输出如下:

<ANAPH id="100216" biotype="supertype" assoc_ante="48275" assoc_rel="set-member" coref_chain="set_234">The C. elegans genome sequence</ANAPH> was completed two years ago [ 1 ] , and both the Drosophila [ 2 ] and human genomes are essentially completely sequenced at this point .</s>

所需的输出是:

The C. elegans genome sequence was completed two years ago [ 1 ] , and both the Drosophila [ 2 ] and human genomes are essentially completely sequenced at this point .

你能帮助我概括正则表达式吗?

1 个答案:

答案 0 :(得分:4)

鉴于此:<s id="1">Here we show that <ANAPH id="535" biotype="partof_product">the approximately 600-amino acid; region</ANAPH> something somethingelse .</s>

使用此<[^>]*?>并替换为空白即可:

Here we show that the approximately 600-amino acid; region something somethingelse .