我有一个如下段落(这是一个示例段落 - 在我的其他样本中,单词和字母保持不变,只有数字发生变化):
blablabla
Reflux Table - Day1
Total Upright Supine Meal PostPr Cough
Duration of Period (d,hh: mm) 23:13 14:05 09:08 00:48 05:59 00:15
Number of Refluxes 56 56 0 1 32 1
Number of Long Refluxes>5 (min) 1 1 0 0 0 0
Duration of longest reflux (min) 5 5 0 0 4 1
Time pH <4 (min) 66 66 0 0 40 1
Fraction Time pH <4 (%) 4.8 0.0 11.3 3.6
some more text blablaotherStuff
我想提取以下段落
Reflux Table - Day1
Total Upright Supine Meal PostPr Cough
Duration of Period (d,hh: mm) 23:13 14:05 09:08 00:48 05:59 00:15
Number of Refluxes 56 56 0 1 32 1
Number of Long Refluxes>5 (min) 1 1 0 0 0 0
Duration of longest reflux (min) 5 5 0 0 4 1
Time pH <4 (min) 66 66 0 0 40 1
Fraction Time pH <4 (%) 4.8 0.0 11.3 3.6
为此,我有以下代码:
Pattern ReflDay1_pattern = Pattern.compile("Reflux Table - Day1 .*?Fraction Time[^\n]*",Pattern.DOTALL);
Matcher matcherReflDay1_pattern = ReflDay1_pattern.matcher(s);
ArrayList<String> ReflDay1_arr = new ArrayList<String>();
try {
while (matcherReflDay1_pattern.find()) {
ReflDay1_arr.add(matcherReflDay1_pattern.group(0));
System.out.println("matcherReflDay1_pattern.group(0)"+matcherReflDay1_pattern.group(0));
}
}
catch (Exception e) {
e.printStackTrace();
}
然而,这个结果会切掉最后一个值,这样我就会失去'3.6'。这发生在我尝试的任何段落中。我如何确保它包含在内 - 是正则表达式(我已经测试了正则表达式,它确实提取了它应该提取的内容,包括值3.6)?
Reflux Table - Day1
Total Upright Supine Meal PostPr Cough
Duration of Period (d,hh: mm) 23:13 14:05 09:08 00:48 05:59 00:15
Number of Refluxes 56 56 0 1 32 1
Number of Long Refluxes>5 (min) 1 1 0 0 0 0
Duration of longest reflux (min) 5 5 0 0 4 1
Time pH <4 (min) 66 66 0 0 40 1
Fraction Time pH <4 (%) 4.8 0.0 11.3
答案 0 :(得分:1)
我的猜想是行结尾实际上是"\r\n"
(Windows),但只有3.6被写为"\n 3.6"
等。记事本会将其显示为在同一行。
Pattern ReflDay1_pattern = Pattern.compile(
"Reflux Table - Day1 .*?Fraction Time[^\r\n]*\n[^\r\n]*", Pattern.DOTALL);
使用\r
也可以阻止此字符跟踪字符串。
String g = matcherReflDay1_pattern.group(0).replaceAll("\r?\n", " ");
答案 1 :(得分:0)
我在你的代码片段中尝试了这个,效果很好!!!
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Parser {
public static void main(String[] args) throws Exception {
FileInputStream f = new FileInputStream("C:\\Users\\NPGM81B\\Desktop\\text.txt");
Pattern ReflDay1_pattern = Pattern.compile(
"Reflux Table - Day1 .*?Fraction Time[^\n]*", Pattern.DOTALL);
Matcher matcherReflDay1_pattern = ReflDay1_pattern.matcher(getStringFromInputStream(f));
ArrayList<String> ReflDay1_arr = new ArrayList<String>();
try {
while (matcherReflDay1_pattern.find()) {
ReflDay1_arr.add(matcherReflDay1_pattern.group(0));
System.out.println("matcherReflDay1_pattern.group(0) : "
+ matcherReflDay1_pattern.group(0));
}
}
catch (Exception e) {
e.printStackTrace();
}
}
// convert InputStream to String
private static String getStringFromInputStream(InputStream is) {
BufferedReader br = null;
StringBuilder sb = new StringBuilder();
String line;
try {
br = new BufferedReader(new InputStreamReader(is));
while ((line = br.readLine()) != null) {
sb.append(line);
}
} catch (IOException e) {
e.printStackTrace();
} finally {
if (br != null) {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
return sb.toString();
}
}
text.txt
-------------
Reflux Table - Day1
Total Upright Supine Meal PostPr Cough
Duration of Period (d,hh: mm) 23:13 14:05 09:08 00:48 05:59 00:15
Number of Refluxes 56 56 0 1 32 1
Number of Long Refluxes>5 (min) 1 1 0 0 0 0
Duration of longest reflux (min) 5 5 0 0 4 1
Time pH <4 (min) 66 66 0 0 40 1
Fraction Time pH <4 (%) 4.8 0.0 11.3 3.6