正则表达式需要找到仅包含[st || nd || rd || th]的数字

时间:2014-08-12 08:56:33

标签: java regex

我需要一个正则表达式才能找到包含这些单词的数字:

 1st, 2nd, 3rd, 4th, 5th.

来自以下文字:

 <xps:span class="ref_sn">Huang</xps:span></xps:span> <xps:span
 class="ref_au"><xps:span class="ref_gn">K.</xps:span> <xps:span
 class="ref_sn">Chingin</xps:span></xps:span> <xps:span
 class="ref_au"><xps:span class="ref_gn">R.</xps:span> <xps:span
 class="ref_sn">Zenobi</xps:span> 1st</xps:span> <xps:span
 class="ref_atitle">Real<span class='xps_ndash'>&#8211;iou</span>time,
 on<span class='xps_ndash'> 2nd &#8211;iou</span>line 4th monitoring of
 organic chemical reactions using 3rd extractive electrospray
 ionization tandem mass 5th spectrometry</xps:span> <xps:span
 class="ref_jtitle">Rapid Commun. Mass Spectrom.</xps:span>

我需要将这些字母转换为sup。

我正在使用这个正则表达式,但它不起作用。

(\b)(\d+([st|nd|rd|th]+)\b)

4 个答案:

答案 0 :(得分:4)

[st|nd|rd|th] Character class也称为字符集,您可以告诉正则表达式引擎只匹配多个字符中的一个。

[st|nd|rd|th]            any character of: 
                       's', 't', '|', 'n', 'd',
                       '|', 'r', 'd', '|', 't', 'h'

您需要使用(...)代替[...]


你可以尝试

\d+(?=st|nd|rd|th)

这是demo

示例代码:

String str = "1st, 2nd, 3rd, 4th, 5th.";
Pattern p = Pattern.compile("\\d+(?=st|nd|rd|th)");
Matcher m = p.matcher(str);
while (m.find()) {
    System.out.println(m.group());
}

输出

1
2
3
4
5

您可以使用捕获组修改正则表达式,如下所示,并获得所需的匹配组:

Pattern p=Pattern.compile("(\\d+)(st|nd|rd|th)");
Matcher m=p.matcher(str);
while(m.find()){
    System.out.println(m.group(1));
}

答案 1 :(得分:1)

试试:

试试以下正则表达式:

(\d+(?:st|nd|rd|th))

demo

答案 2 :(得分:0)

稍微修改一下你的代码:

public static void main(String[] args) {
    String s = "Huang K. Chingin R. Zenobi 1st Real–ioutime, on 2nd –iouline 4th monitoring of organic chemical reactions using 3rd extractive electrospray ionization tandem mass 5th spectrometry Rapid Commun. Mass Spectrom";
    Pattern p = Pattern.compile("\\d+(?=st|nd|rd|th)");
    Matcher m = p.matcher(s);
    while (m.find()) {
        System.out.println(m.group());
    }
}

O / P:

1
2
4
3
5

答案 3 :(得分:0)

仅提取st,nd,rd之前的数字:

\d+(?=st|nd|rd|th)

如果您希望正则表达式不区分大小写,请使用:

(?i)\d+(?=st|nd|rd|th)