Question

正则表达式用于搜索单词以及返回行中的最后一个字符。到目前为止我所拥有的是这个 - ＆gt; ＆＃34; [A-Z] $ | [A-ZA-Z] +＆＃34;

文字是＆＃34;很多??? Woooooooooooords是＆＃34;。

问题是＆＃34;＆＃34;＆＃34;匹配而不是＆＃34; e＆＃34;，第二个正则表达式模式优先。我希望＆＃34;是＆＃34;匹配以及＆＃34; e＆＃34;。

对此有何解决方案？

Answer 1

使用捕获组：

([a-zA-Z]+([a-z]))$

请参阅ACM Digital Library

对于文字many??? Woooooooooooords are，{1}在第1组中捕获，are在第2组中捕获。

Answer 2

您可以使用：

([a-zA-Z]+)|([a-zA-Z]+([a-zA-Z]))$

这将捕获所有单词以及文本中的最后一个字母。您需要使用“g”修饰符（全局）和正则表达式。

Answer 3

至少在.NET，Java，Javascript和PHP中（所以......似乎是标准的），Group [0]包含所有匹配本身，所以你只需要对正则表达式中的最后一个字母进行分组

[A-ZA-Z] +（[A-ZA-Z]）$

＆＃34;许多??? Woooooooooooords是＆＃34;。

你的文字会是这样的：

组[0] =＆＃34;是＆＃34;

组[1] =＆＃34; e＆＃34;

Answer 4

主要问题是正则表达式不能多次使用文本。您只能捕获重叠的文本，并且可以在内容中进行。

因此，您可以使用

(?s)^(?=.*([a-z])$)|[a-zA-Z]+

请参阅regex demo

解释：

(?s) - 启用DOTALL模式，以便.可以匹配换行符
^ - 字符串开头
(?=.*([a-z])$) - 一个积极的先行，检查所有字符串并捕获最后一个字母。如果有尾随空格，请将其替换为(?=.*([a-z])\\s*$)。请注意，您可以使用\\p{Ll}来匹配Unicode小写字母。
| - 或......
[a-zA-Z]+ - 一个或多个字母（您实际上可以在Java中使用\\pL而不是这个以允许匹配的Unicode字母）

因为它是Java，你只需要检查第一组是不是null，如果没有，你得到最后一个字母。如果第一个组为空，那么你就得到了一个词。

String s = "many??? Woooooooooooords are"; 
Pattern pattern = Pattern.compile("(?s)^(?=.*([a-z])$)|[a-zA-Z]+");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
    if (matcher.group(1) != null) {
        System.out.println("Last letter: " + matcher.group(1));
    }
    else {
        System.out.println("Word found: " + matcher.group(0)); 
    }
}

请参阅IDEONE demo

Answer 5

此正则表达式的速度是[a-zA-Z]+($(?<=[a-z]))?的两倍与@stribizhev的

相比

最值得注意的是失败，在这种情况下，最后一个字符不是小写。

基准

失败（！= [a-z] $）：
样本"many??? Woooooooooooords are in the Fountain of despaR

Regex1:   [a-zA-Z]+($(?<=[a-z]))?
Options:  < none >
Completed iterations:   50  /  50     ( x 1000 )
Matches found per iteration:   8
Elapsed Time:    0.68 s,   679.77 ms,   679771 µs


Regex2:   ^(?s)(?=.*([a-z])$)|[a-zA-Z]+
Options:  < none >
Completed iterations:   50  /  50     ( x 1000 )
Matches found per iteration:   8
Elapsed Time:    1.14 s,   1139.35 ms,   1139345 µs

成功（== [a-z] $）：
样本"many??? Woooooooooooords are in the Fountain of despar

Regex1:   [a-zA-Z]+($(?<=[a-z]))?
Options:  < none >
Completed iterations:   50  /  50     ( x 1000 )
Matches found per iteration:   8
Elapsed Time:    0.68 s,   678.97 ms,   678965 µs


Regex2:   ^(?s)(?=.*([a-z])$)|[a-zA-Z]+
Options:  < none >
Completed iterations:   50  /  50     ( x 1000 )
Matches found per iteration:   9
Elapsed Time:    0.72 s,   717.28 ms,   717276 µs

正则表达式匹配最后一个字母以及句子中的单词

5 个答案: