Question

假设：

- 输入 -

Keep this.
And keep this.

And keep this too.
Chomp this chomp:
Anything beyond here gets chomped.

- 输出（预期） -

Keep this.
And keep this.

And keep this too.

我如何匹配每个分组的正则表达式，以便一旦找到“chomp：”，从该行的开头和之后的所有内容都会被选中（删除）？

String text = "Keep this.\nAnd keep this.\n\nAnd keep this too.\n"
        + "This could be anything here chomp:\nAnything beyond here gets chomped.";
Pattern CHOMP= Pattern.compile("^((.*)chomp:(.*))$",  Pattern.MULTILINE | Pattern.DOTALL);
Matcher m = CHOMP.matcher(text);
if (m.find()) {
    int count = m.groupCount();
    //         
    // How can I match a group here to either delete or keep for expected output?
    //
    // text = <match a group to assign or replace non-desired text>;
    System.out.println(text);  // Should output contents from above -- output (expected) --
}

Answer 1

这是一种方法a demonstration on ideaone。

我略微简化了模式;然而，我的代码中最大的变化是它运行而没有 DOTALL选项 - 而DOTALL .将错误地匹配跨多行。

^(.*)chomp:(.*)

模式应匹配一次（似乎是意图），用“chomp：”之前/之后的文本填充组1和2，其余数据将是“消费”因为它只是未处理。要在正则表达式匹配之前获取数据（而不是匹配），我使用以下结构：

StringBuffer sb = new StringBuffer(); matcher.appendReplacement(sb, "");

（虽然这可以用子串替换，但我想，这个成语mirrors other patterns。）

如果您希望进行面向行的处理（适用于大流），那么正确的方法是依次处理每一行。我自己可能会使用分割或扫描方法，但我希望将这个答案保留在最初提出的原始整体正则表达式中。

例如：

Scanner s = new Scanner(input); while (s.hasNextLine()) { // process next line and "break" if it matches the end-line condition }

来自ideone的片段：

String text = "Keep this.\nAnd keep this.\n\nAnd keep this too.\n" + "Chomp this chomp:\nAnything beyond here gets chomped."; Pattern CHOMP= Pattern.compile("^(.*)chomp:(.*)", Pattern.MULTILINE); Matcher m = CHOMP.matcher(text); if (m.find()) { System.out.println(" LINE:" + m.group(0)); System.out.println("BEFORE:" + m.group(1)); System.out.println(" AFTER:" + m.group(2)); System.out.println(">>>"); StringBuffer sb = new StringBuffer(); m.appendReplacement(sb, ""); System.out.print(sb); System.out.println("<<<"); }

Answer 2

我使用这种方法实现了预期的输出：

    public static void main(String[] args) {
        String text = "Keep this.\nAnd keep this.\n\nAnd keep this too.\n"
                + "Chomp this chomp:\nAnything beyond here gets chomped.";
        Pattern CHOMP= Pattern.compile("[c|C]homp");
        Matcher m = CHOMP.matcher(text);
        if (m.find()) {
            String s = text.substring(0, m.start());

            System.out.println(s);  
        }      
    }

[c|C]检查大写或小写“C”，在本例中使用两者。当找到chomp / Chomp的第一个实例时，我调用了substring方法，该方法将在第一次匹配后删除everthing。

我知道您提到使用群组，是否有特定原因或此解决方案是否足够？

Answer 3

一种方法可能是：

根据。（点）运算符
通过线条迭代。一旦你发现chomp else打印行，就会突然断开循环。

伴随此的代码片段：

String text = "Keep this.\nAnd keep this.\n\nAnd keep this too.\n"
            + "Chomp this chomp:\nAnything beyond here gets chomped.";
String[] split = text.split("\\.");
            for(int i=0;i<split.length;i++) {
                if(split[i].contains("Chomp") || split[i].contains("chomp"))
                    break;
                System.out.println(split[i]);
            }

输出：

Keep this

And keep this


And keep this too

Answer 4

String newText = text.replaceAll("(?m)^.*chomp(?s).*", "");

内联修饰符(?m)打开MULTILINE模式，因此^可以匹配行的开头。但是DOTALL模式仍处于关闭状态，因此如果它在同一行中找不到chomp，它会放弃并在下一行的开头再次尝试。当它找到包含chomp的行时，(?s)会启用DOTALL模式，因此第二个.*可以使用其余的文本，换行符和所有内容。

我不知道你要对groupCount()做些什么。如果您的目标只是摆脱chomp行及其后的所有内容，则无需使用捕获组。无论如何，该方法只能告诉正则表达式中有多少个捕获组。它是与Matcher关联的Pattern对象的静态属性;它没有告诉你关于实际匹配的内容。

匹配正则表达式回到同一行的开头？

4 个答案: