如何使用正则表达式将文本拆分为标题和内容?

时间:2014-12-23 08:30:23

标签: java regex

Oracle备份日志文件包含如下所示的标题和内容:

Starting backup at 14-JUL-13
channel d1: starting compressed incremental level 0 datafile backup set
channel d1: specifying datafile(s) in backup set
input datafile file number=00004 name=/oradata/reports1/qqq01.dbf
input datafile file number=00001 name=/oradata/reports1/aaa01.dbf
input datafile file number=00002 name=/oradata/reports1/xxx01.dbf
input datafile file number=00003 name=/oradata/reports1/bbbs01.dbf
<...>

Starting backup at 15-JUL-13
current log archived
channel d1: starting compressed archived log backup set
channel d1: specifying archived log(s) in backup set
input archived log thread=1 sequence=580 RECID=288 STAMP=820739223
input archived log thread=1 sequence=581 RECID=289 STAMP=820739223
<...>

Starting backup at 16-JUL-13
<...>

我尝试使用RegExp和Java将其拆分为标题和内容。

我糟糕的工作正则表达式(非Java格式):

 ^Starting backup at \d{2}-[A-Z]{3}-\d{2}$+((?:.|\\n)+?)

Java标志启用DOTALL和MULTILINE。

它返回标题但内容为空。 任何帮助表示赞赏。

2 个答案:

答案 0 :(得分:1)

(Starting backup at \d{1,2}-[A-Z]{3,4}-\d{1,2})([\s\S]*?)(?=\n{2}|$)

你可以试试这个。看看demo。不需要s标志。

https://regex101.com/r/vN3sH3/66

(Starting backup at \d{1,2}-[A-Z]{3,4}-\d{1,2})([\s\S]*?)(?=Starting backup at \d{1,2}-[A-Z]{3,4}-\d{1,2}|$)

https://regex101.com/r/vN3sH3/67

答案 1 :(得分:0)

你可以使用DOTALL标志。

String s = "Starting backup at 14-JUL-13\n" + 
        "channel d1: starting compressed incremental level 0 datafile backup set\n" + 
        "channel d1: specifying datafile(s) in backup set\n" + 
        "input datafile file number=00004 name=/oradata/reports1/qqq01.dbf\n" + 
        "input datafile file number=00001 name=/oradata/reports1/aaa01.dbf\n" + 
        "input datafile file number=00002 name=/oradata/reports1/xxx01.dbf\n" + 
        "input datafile file number=00003 name=/oradata/reports1/bbbs01.dbf\n" + 
        "<...>\n" + 
        "\n" + 
        "Starting backup at 15-JUL-13\n" + 
        "current log archived\n" + 
        "channel d1: starting compressed archived log backup set\n" + 
        "channel d1: specifying archived log(s) in backup set\n" + 
        "input archived log thread=1 sequence=580 RECID=288 STAMP=820739223\n" + 
        "input archived log thread=1 sequence=581 RECID=289 STAMP=820739223\n" + 
        "<...>\n" + 
        "\n" + 
        "Starting backup at 16-JUL-13\n" + 
        "<...>";
Matcher m = Pattern.compile("(?s)(Starting backup at \\d{1,2}-[A-Z]{3,4}-\\d{1,2})\\n(.*?)(?=\\n\\n|$)").matcher(s);
while(m.find())
{
    System.out.println(m.group(1));
    System.out.println(m.group(2));
}

DEMO

组索引1包含标题部分,组索引2包含内容部分。