Java正则表达式意外删除了最后一个值

时间:2016-09-22 09:36:59

标签: java regex

我有一个如下段落(这是一个示例段落 - 在我的其他样本中,单词和字母保持不变,只有数字发生变化):

blablabla

Reflux Table - Day1 
 Total Upright Supine Meal PostPr Cough
 Duration of Period  (d,hh: mm) 23:13 14:05 09:08 00:48 05:59 00:15
Number of Refluxes 56 56 0 1 32 1
Number of Long Refluxes>5  (min) 1 1 0 0 0 0
Duration of longest reflux (min) 5 5 0 0 4 1
Time pH <4  (min) 66 66 0 0 40 1
Fraction Time pH <4  (%) 4.8 0.0 11.3 3.6

some more text blablaotherStuff

我想提取以下段落

Reflux Table - Day1 
 Total Upright Supine Meal PostPr Cough
 Duration of Period  (d,hh: mm) 23:13 14:05 09:08 00:48 05:59 00:15
Number of Refluxes 56 56 0 1 32 1
Number of Long Refluxes>5  (min) 1 1 0 0 0 0
Duration of longest reflux (min) 5 5 0 0 4 1
Time pH <4  (min) 66 66 0 0 40 1
Fraction Time pH <4  (%) 4.8 0.0 11.3 3.6

为此,我有以下代码:

Pattern ReflDay1_pattern = Pattern.compile("Reflux Table - Day1 .*?Fraction Time[^\n]*",Pattern.DOTALL);
Matcher matcherReflDay1_pattern = ReflDay1_pattern.matcher(s);
ArrayList<String> ReflDay1_arr = new ArrayList<String>();

        try {
            while (matcherReflDay1_pattern.find()) {
        ReflDay1_arr.add(matcherReflDay1_pattern.group(0)); 
        System.out.println("matcherReflDay1_pattern.group(0)"+matcherReflDay1_pattern.group(0));
                                                 }
            } 

        catch (Exception e) {
                  e.printStackTrace();
                    }

然而,这个结果会切掉最后一个值,这样我就会失去'3.6'。这发生在我尝试的任何段落中。我如何确保它包含在内 - 是正则表达式(我已经测试了正则表达式,它确实提取了它应该提取的内容,包括值3.6)?

Reflux Table - Day1 
 Total Upright Supine Meal PostPr Cough
 Duration of Period  (d,hh: mm) 23:13 14:05 09:08 00:48 05:59 00:15
Number of Refluxes 56 56 0 1 32 1
Number of Long Refluxes>5  (min) 1 1 0 0 0 0
Duration of longest reflux (min) 5 5 0 0 4 1
Time pH <4  (min) 66 66 0 0 40 1
Fraction Time pH <4  (%) 4.8 0.0 11.3

2 个答案:

答案 0 :(得分:1)

我的猜想是行结尾实际上是"\r\n"(Windows),但只有3.6被写为"\n 3.6"等。记事本会将其显示为在同一行。

Pattern ReflDay1_pattern = Pattern.compile(
        "Reflux Table - Day1 .*?Fraction Time[^\r\n]*\n[^\r\n]*", Pattern.DOTALL);

使用\r也可以阻止此字符跟踪字符串。

String g = matcherReflDay1_pattern.group(0).replaceAll("\r?\n", " ");

答案 1 :(得分:0)

我在你的代码片段中尝试了这个,效果很好!!!

import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Parser {

    public static void main(String[] args) throws Exception {
        FileInputStream f = new FileInputStream("C:\\Users\\NPGM81B\\Desktop\\text.txt");
        Pattern ReflDay1_pattern = Pattern.compile(
                "Reflux Table - Day1 .*?Fraction Time[^\n]*", Pattern.DOTALL);
        Matcher matcherReflDay1_pattern = ReflDay1_pattern.matcher(getStringFromInputStream(f));
        ArrayList<String> ReflDay1_arr = new ArrayList<String>();

        try {
            while (matcherReflDay1_pattern.find()) {
                ReflDay1_arr.add(matcherReflDay1_pattern.group(0));
                System.out.println("matcherReflDay1_pattern.group(0)   : "
                        + matcherReflDay1_pattern.group(0));
            }
        }

        catch (Exception e) {
            e.printStackTrace();
        }
    }

    // convert InputStream to String
    private static String getStringFromInputStream(InputStream is) {

        BufferedReader br = null;
        StringBuilder sb = new StringBuilder();

        String line;
        try {

            br = new BufferedReader(new InputStreamReader(is));
            while ((line = br.readLine()) != null) {
                sb.append(line);
            }

        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if (br != null) {
                try {
                    br.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }

        return sb.toString();

    }


}

text.txt
-------------
Reflux Table - Day1 
 Total Upright Supine Meal PostPr Cough
 Duration of Period  (d,hh: mm) 23:13 14:05 09:08 00:48 05:59 00:15
Number of Refluxes 56 56 0 1 32 1
Number of Long Refluxes>5  (min) 1 1 0 0 0 0
Duration of longest reflux (min) 5 5 0 0 4 1
Time pH <4  (min) 66 66 0 0 40 1
Fraction Time pH <4  (%) 4.8 0.0 11.3 3.6