Java RegEx - 用开始和结束拆分段落的正则表达式

时间:2009-11-10 09:54:12

标签: java regex split

我是java regex的新手。请帮帮我。 请考虑以下段落,

段落:

            Name abc
            sadghsagh
            hsajdjah Name
            ggggggggg
            !!!
            Name ggg
            dfdfddfdf Name
            !!!
            Name hhhh
            sahdgashdg Name
            asjdhjasdh
            sadasldkalskd
            asdjhakjsdhja
            !!!

我需要将以上段落拆分为以Name开头并以!!!结尾的文本块。在这里,我不想用!作为拆分段落的唯一分隔符。我需要在我的正则表达式中包含起始序列(Name)。

即,我的结果api应该看起来像SplitAsBlocks(“段落”,“以名称开头”,“结束于 !!!“)

如何实现这一点,请任何人帮助我......

现在我想要与Brito给出相同的输出......但是在这里我在“hsajdjah”之后添加了Name。这里它将文本拆分为beow:

Name
ggggggggg
!!!

但我需要

Name abc
sadghsagh
hsajdjah Name
ggggggggg
!!!

这是我必须匹配名称,它位于行的起点,而不是中间。

请建议我......

Bart ...请参阅以下输入案例以获取代码...

我需要使用带有参数start =>的ur API拆分以下内容。名称和结束=> ! 但输出变化..i只有3个块以Name开头并以!结尾! 。 我也附加了输出。

String myInput =    "Name hhhhh class0"+ "\n"+
                     "HHHHHHHHHHHHHHHHHH"+ "\n"+
                     "!"+ "\n"+
                     "Name TTTTT TTTT"+ "\n"+
                     "GGGGGG UUUUU IIII"+ "\n"+
                     "!"+ "\n"+
                     "Name JJJJJ WWWW"+ "\n"+
                     "IIIIIIIIIIIIIIIIIIIII"+ "\n"+
                     "!"+ "\n"+
                     "RRRRRRRRRRR TTTTTTTT"+ "\n"+
                     "HHHHHH"+ "\n"+
                     "JJJJJ 1 Name class1"+ "\n"+
                     "LLLLL 5 Name class5"+ "\n"+
                     "!"+ "\n"+
                     "OOOOOO HHHH FFFFFF"+ "\n"+
                     "service 0 Name class12"+ "\n"+
                     "!"+ "\n"+
                     "JJJJJ YYYYYY 3/0"+ "\n"+
                     "KKKKKKK"+ "\n"+
                     "UUU UUU UUUUU"+ "\n"+
                     "QQQQQQQ"+ "\n"+
                         "!";
    String[] tokens = tokenize(myInput, "Name", "!");
    int n = 0;
    for(String t : tokens) {
        System.out.println("---------------------------\n"+(++n)+"\n"+t);
    }

OutPut:

---------------------------
1
Name hhhhh class0
HHHHHHHHHHHHHHHHHH
!
---------------------------
2
Name TTTTT TTTT
GGGGGG UUUUU IIII
!
---------------------------
3
Name JJJJJ WWWW
IIIIIIIIIIIIIIIIIIIII
!
---------------------------
4
Name class1
LLLLL 5 Name class5
!
---------------------------
5
Name class12
!

这里我只需要在行首而不是中间的名字... 如何为此添加正则表达式...

3 个答案:

答案 0 :(得分:4)

尝试:

import java.util.*;
import java.util.regex.*;

public class Main { 

    public static String[] tokenize(String text, String start, String end) {
        // old line:
        //Pattern p = Pattern.compile("(?s)"+Pattern.quote(start)+".*?"+Pattern.quote(end));
        // new line:
        Pattern p = Pattern.compile("(?sm)^"+Pattern.quote(start)+".*?"+Pattern.quote(end)+"$");

        Matcher m = p.matcher(text);
        List<String> tokens = new ArrayList<String>();
        while(m.find()) {
            tokens.add(m.group());
        }
        return tokens.toArray(new String[]{});
    }

    public static void main(String[] args) {
        String text = "Name abc" + "\n" +
            "sadghsagh"          + "\n" +
            "hsajdjah Name"      + "\n" +
            "ggggggggg"          + "\n" +
            "!!!"                + "\n" +
            "Name ggg"           + "\n" +
            "dfdfddfdf Name"     + "\n" +
            "!!!"                + "\n" +
            "Name hhhh"          + "\n" +
            "sahdgashdg Name"    + "\n" +
            "asjdhjasdh"         + "\n" +
            "sadasldkalskd"      + "\n" +
            "asdjhakjsdhja"      + "\n" +
            "!!!";
        String[] tokens = tokenize(text, "Name", "!!!");
        int n = 0;
        for(String t : tokens) {
            System.out.println("---------------------------\n"+(++n)+"\n"+t);
        }
    }
}

答案 1 :(得分:3)

String s = "Name abc sadghsagh hsajdjah !!! Name ggg dfdfddfdf !!! Name hhhh sahdgashdg asjdhjasdh sadasldkalskd asdjhakjsdhja !!!!! ";
String startsWith = "Name";
String endsWith = "!!!";

// non-greedily get all groups starting with Name and ending with !!!
String pattern = String.format("(%s).*?(%s)", Pattern.quote(startsWith), Pattern.quote(endsWith));
System.out.println(pattern);

Matcher m = Pattern.compile(pattern, Pattern.DOTALL).matcher(s);
while (m.find()) 
  System.out.println(m.group());

输出:

(\QName\E).*?(\Q!!!\E)
Name abc sadghsagh hsajdjah !!!
Name ggg dfdfddfdf !!!
Name hhhh sahdgashdg asjdhjasdh sadasldkalskd asdjhakjsdhja !!!

答案 2 :(得分:0)

如果您想在结果中保留Name!!!,也应该执行以下操作。

<击>     String [] parts = string.split(“(?=(Name | !!!))”);

修改:这是更正后的版本:

String[] parts = string.split("(?<=!!!)\\s*(?=Name)");

这将在!!!Name之间的任何空格上分开,而不会分开;特此保留这两个部分。如果您不希望在!!!Name上拆分,请将\\s*替换为\\s+以允许一对多匹配,而不是零对多匹配。

Edit2 :附上输入/输出的示例。输入是从topicstart复制的:

String string = "Name hhhhh class0" + "\n" + "HHHHHHHHHHHHHHHHHH" + "\n" + "!" + "\n"
    + "Name TTTTT TTTT" + "\n" + "GGGGGG UUUUU IIII" + "\n" + "!" + "\n"
    + "Name JJJJJ WWWW" + "\n" + "IIIIIIIIIIIIIIIIIIIII" + "\n" + "!" + "\n"
    + "RRRRRRRRRRR TTTTTTTT" + "\n" + "HHHHHH" + "\n" + "JJJJJ 1 Name class1" + "\n"
    + "LLLLL 5 Name class5" + "\n" + "!" + "\n" + "OOOOOO HHHH FFFFFF" + "\n"
    + "service 0 Name class12" + "\n" + "!" + "\n" + "JJJJJ YYYYYY 3/0" + "\n" + "KKKKKKK"
    + "\n" + "UUU UUU UUUUU" + "\n" + "QQQQQQQ" + "\n" + "!";

String[] parts = string.split("(?<=!)\\s*(?=Name)");
for (String part : parts) {
    System.out.println(part);
    System.out.println("---------------------------------");
}

输出:

Name hhhhh class0
HHHHHHHHHHHHHHHHHH
!
---------------------------------
Name TTTTT TTTT
GGGGGG UUUUU IIII
!
---------------------------------
Name JJJJJ WWWW
IIIIIIIIIIIIIIIIIIIII
!
RRRRRRRRRRR TTTTTTTT
HHHHHH
JJJJJ 1 Name class1
LLLLL 5 Name class5
!
OOOOOO HHHH FFFFFF
service 0 Name class12
!
JJJJJ YYYYYY 3/0
KKKKKKK
UUU UUU UUUUU
QQQQQQQ
!
---------------------------------

看起来不错?