Question

测试字符串：

TEST Hello, world, 75793250
TEST TESTER Hello, world. Another word here. 75793250

期望的比赛：

Hello, world, 
Hello, world. Another word here.

我想选择大写字母和8位数字之间的所有内容。

我该怎么做？

编辑：目的是使用Notepad ++清理大型文本文件。我正在使用Notepad ++和Rubular.com进行测试。

Answer 1

尝试这样的事情：

/(?<=[A-Z]+(?: [A-Z]+)*\b)(?:(?!\b\d{8}).)*/

基本上：

查看所有大写字母或空格，然后是分词。
然后开始匹配，从那一点开始，匹配，直到你遇到一个单词分隔后跟8位数字。

如果你的正则表达式引擎抱怨（就像我的）关于可变长度的外观，请尝试这样做：

/(?:[A-Z]+(?: [A-Z]+)*\b)((?:(?!\b\d{8}).)*)/

收率：

>> "TEST Hello, world, 75793250".match /(?:[A-Z]+(?: [A-Z]+)*\b)((?:(?!\b\d{8}).)*)/
=> #<MatchData "TEST Hello, world, " 1:" Hello, world, ">

>> "TEST TESTER Hello, world. Another word here. 75793250".match /(?:[A-Z]+(?: [A-Z]+)*\b)((?:(?!\b\d{8}).)*)/
=> #<MatchData "TEST TESTER Hello, world. Another word here. " 1:" Hello, world. Another word here. ">

Answer 2

尝试以下

\b[A-Z]+\b\s+(.*)\d{8}

修改为在开头排除大写单词。所寻求的文本在捕获组1中：

(?:\b[A-Z]+\b\s+)+(.*)\d{8}

如果大写单词（标记）仅在行的开头：

^(?:\b[A-Z]+\b\s+)+(.*)\d{8}

Answer 3

您可以使用以下java代码：

    String str = "TEST TESTER Hello, world. Another word here. 75793250";
    Pattern pattern = Pattern.compile("(([A-Z]+\\s)+)([^\n]*)([0-9]{8})");
    Matcher m = pattern.matcher(str);
    while (m.find()){
        System.out.println(m.group(3));
    }

Answer 4

使用字符类创建一个仅匹配大写字母的原子 - [A-Z]。那么你想多次匹配（至少一次？），所以[A-Z]+。

然后你想抓住任何可能的东西 - .+，但是你想抓住它，所以把它包装在一个命名的捕获中 - (?<nameHere>.+)。

然后你想匹配数字以使用数字来进行捕获，这样数字就不会在捕获中结束（因为.+匹配任何东西）。 \d是数字字符类快捷方式，我们需要一个或多个数字，因此\d+。

将所有内容放在一起，在所有内容之间寻找空格（\s）：

[A-Z]+\s+(?<nameHere>.+)\s+\d+

使用Match类 - Match.Captures拉出命名的捕获。

正则表达式：在大写字母之后和数字之前捕获任何内容？

4 个答案: