正则表达式匹配后获得n行数

时间:2016-09-30 10:45:38

标签: java regex

我有一些文字,其中一个例子如下:

Lactose Hydrogen Breath Test
  Time
        Time Point (min)
        H2  (ppm)
        H2 Change

    (ppm)
        Hydrogen (ppm)

        0937
        0
        0/0

        Time point (min)

        0
        10
        20
        30
        40
        50
        60
        70
        80
        90
        100


        Notes: Measurements at 120 and 150 mins are insignificant changes and are most probably due to sporadic error.

        Results are not consistent with Lactose malabsorption.

        Lactose intolerance is not suggested.

This is now some other text that can be anything

我只是想在'Notes'之后提取前五行并留下所有其他东西(在这种情况下,不建议使用Lactose不耐受但是之后可以有任何类型的文本。

我正在使用当前的java来解压缩:

public Map<String,String> LactoseTestExtractor(String str){

        Pattern match_pattern = Pattern.compile("Lactose Hydrogen Breath Test(.*?Interpretation[^\\r|^\\n]*)",Pattern.DOTALL);
        Matcher matchermatch_pattern = match_pattern.matcher(str);

        Pattern match_pattern2 = Pattern.compile("Lactose Hydrogen Breath Test.*?(Notes:.*?\\r|\\n[\\r|\\n]?.*?\\r|\\n[\\r|\\n]?)",Pattern.DOTALL);
        Matcher matchermatch_pattern2 = match_pattern2.matcher(str);

        if (matchermatch_pattern.find()) {
            lact=matchermatch_pattern.group(1).toString().trim();
            System.out.println("lact1"+lact);

        }

        else if (matchermatch_pattern2.find()){
            lact=matchermatch_pattern2.group(1).toString().trim();
            System.out.println("lact2"+lact);

        }

然而,我得到了整场比赛,而不仅仅是我想要的:

Measurements at 120 and 150 mins are insignificant changes and are most probably due to sporadic error.

        Results are not consistent with Lactose malabsorption.

        Lactose intolerance is not suggested.

我该如何纠正?不确定它是java还是正则表达式问题

1 个答案:

答案 0 :(得分:1)

首先,Java 8 supports \Rto match a linebreak

对于正则表达式,您可以使用lookbehind匹配Note:,然后使用接下来的5行,如下所示:

(?<=Notes:)(.*\\R){5}

结果位于group(0)