Question

[introduction][position]Lead Researcher and Research Manager[/position] in the [affiliation]Web Search and Mining Group, Microsoft Research[/affiliation]</b>.

I am a [position]lead researcher[/position] at [affiliation]Microsoft Research[/affiliation]. I am also [position]adjunct professor[/position] of [affiliation]Peking University[/affiliation], [affiliation]Xian Jiaotong University[/affiliation] and [affiliation]Nankai University[/affiliation].

I joined [affiliation]Microsoft Research[/affiliation] in June 2001. Prior to that, I worked at the Research Laboratories of NEC Corporation.

I obtained a [bsdegree]B.S.[/bsdegree] in [bsmajor]Electrical Engineering[/bsmajor] from [bsuniv]Kyoto University[/bsuniv] in [bsdate]1988[/bsdate] and a [msdegree]M.S.[/msdegree] in [msmajor]Computer Science[/msmajor] from [msuniv]Kyoto University[/msuniv] in [msdate]1990[/msdate]. I earned my [phddegree]Ph.D.[/phddegree] in [phdmajor]Computer Science[/phdmajor] from the [phduniv]University of Tokyo[/phduniv] in [phddate]1998[/phddate].

I am interested in [interests]statistical learning[/interests], [interests]natural language processing[/interests], [interests]data mining, and information retrieval[/interests].[/introduction]

我可以用上面的段落删除所有标签：

String stripped = html.replaceAll("\\[.*?\\]", "");

但是我想在段落中保留三对标签，分别是[bsuniv][/bsuniv]，[msuniv][/msuniv]和[phduniv][/phduniv]。换句话说，我不想剥离包含关键字“univ”的标签。我找不到一种方便的方法来重写正则表达式。有人帮我吗？

Answer 1

您可以在此处使用negative-look ahead断言： -

str = str.replaceAll("\\[(.(?!univ))*?\\]", "");

或： -

str = str.replaceAll("\\[((?!univ).)*?\\]", "");

它们都会为您提供所需的输出。只有一个区别 -

第一个对当前角色进行负面预测，如果没有univ，则会移动到下一个角色。
第二个在每个字符前对空字符串进行负向预测，如果它没有跟univ，则它会继续匹配单个字符。

正则表达式：除了包含关键字“univ”的标签之外的所有标签

1 个答案: