Question

你能帮忙解决这个问题吗？

看似简单，但总是失败。

@Test
public void normalizeString(){
    StringBuilder ret =  new StringBuilder();
    //Matcher matches = Pattern.compile( "([A-Z0-9])" ).matcher("P-12345678-P");
    Matcher matches = Pattern.compile( "([\\w])" ).matcher("P-12345678-P");
    for (int i = 1; i < matches.groupCount(); i++)
        ret.append(matches.group(i));

    assertEquals("P12345678P", ret.toString());
}

Answer 1

构建Matcher不会自动执行任何匹配。这部分是因为Matcher支持两种不同的匹配行为，不同之处在于匹配是否隐式锚定到Matcher区域的开头。您似乎可以实现所需的结果：

@Test
public void normalizeString(){
    StringBuilder ret =  new StringBuilder();
    Matcher matches = Pattern.compile( "[A-Z0-9]+" ).matcher("P-12345678-P");

    while (matches.find()) {
        ret.append(matches.group());
    }

    assertEquals("P12345678P", ret.toString());
}

请特别注意Matcher.find()的调用，这是您的版本中的一个重要遗漏。此外，nullary Matcher.group()返回与上一个find()匹配的子字符串。

此外，虽然您使用Matcher.groupCount()并非完全错误，但它确实让我怀疑您对它的作用有错误的想法。特别是，在您的代码中，它将始终返回1 - 它查询模式，而不是与其匹配。

Answer 2

首先，您不需要添加任何组，因为组0始终可以访问整个匹配，因此不是

(regex)和group(1)

你可以使用

regex和group(0)

接下来的事情是\\w已经character class，因此您不需要使用其他[ ]将其包围，因为它与[[a-z]]类似{1}}与[a-z]相同。

现在在你的

for (int i = 1; i < matches.groupCount(); i++)
    ret.append(matches.group(i));

您将对来自1的所有群组进行迭代，但您将排除最后一个群组，因为它们是从1开始索引的，因此n因此i<n不会包含{{1} }}。您需要使用n代替。

看起来你好像在混淆什么。此循环不会在输入中找到正则表达式的所有匹配项。在找到正则表达式的匹配后，此循环用于迭代使用的正则表达式 中的组 。

因此，如果正则表达式类似i <= matches.groupCount()，那么您的匹配就像(\w(\w))c那么

abc

会打印

for (int i = 1; i < matches.groupCount(); i++)
    System.out.println(matches.group(i));

因为

第一组在ab b

(\w(\w))

第二组是第一组中的一组，紧跟在第一个角色之后。

但要打印它们，你实际上需要首先让正则表达式引擎迭代输入和c匹配，或检查整个输入find()是否正则表达式，否则你会得到matches()因为正则表达式引擎无法知道您希望从哪个匹配项获得组（输入中可以有许多匹配的正则表达式）。

所以你可能想要使用的是

IllegalStateException

其他方式（可能更简单的解决方案）实际上是删除您不希望输入的所有字符。所以你可以使用StringBuilder ret = new StringBuilder(); Matcher matches = Pattern.compile( "[A-Z0-9]" ).matcher("P-12345678-P"); while (matches.find()){//find next match ret.append(matches.group(0)); } assertEquals("P12345678P", ret.toString());并否定字符类replaceAll，如

[^...]

将生成新字符串，其中所有不是String input = "P-12345678-P"; String result = input.replaceAll("[^A-Z0-9]+", "");的字符都将被删除（替换为A-Z0-9）。

正则表达式只匹配字母和数字

2 个答案: