我的字符串:
2017.11.22样本新闻 - 我在这里和那里有一些文字
2018.12.30新闻样本2 - 我在这里和那里都有一些文字
countLine():
public static int countLines(String filename) throws IOException {
InputStream is = new BufferedInputStream(new FileInputStream(filename));
try {
byte[] c = new byte[1024];
int count = 0;
int readChars = 0;
boolean endsWithoutNewLine = false;
while ((readChars = is.read(c)) != -1) {
for (int i = 0; i < readChars; ++i) {
if (c[i] == '\n')
++count;
}
endsWithoutNewLine = (c[readChars - 1] != '\n');
}
if (endsWithoutNewLine) {
++count;
}
return count;
} finally {
is.close();
}
}
我的代码匹配字符串上的正则表达式 - loadTextFromFile():
public static String loadTextFromFile(String filename, int type) throws FileNotFoundException {
String match = "";
File file = new File(filename);
Scanner scanner = new Scanner(file);
if (type == 0) { // Match date
while (scanner.hasNext()) {
String line = scanner.nextLine();
Matcher m = Pattern.compile("(\\d{4}[\\.]\\d{2}[\\.]\\d{2})").matcher(line);
while (m.find()) {
match = m.group(1).trim();
System.out.println("date: " + match);
}
}
} else {
while (scanner.hasNext()) {
String line = scanner.nextLine();
Matcher m = Pattern.compile("((?!.*[\\d+\\.?\\d+]).*$)").matcher(line);
while (m.find()) {
match = m.group(1).trim();
System.out.println("text: " + match);
}
}
}
return match;
}
主要():
String date, string;
for (int i = 0; i < countLines(FILE_NAME); i++) {
date = loadTextFromFile(FILE_NAME, 0);
string = loadTextFromFile(FILE_NAME, 1);
System.out.println("date:" + i + " " + date);
System.out.println("string:" + i + " " + string);
}
输出:
date:0 2018.12.30
string:0
date:1 2018.12.30
string:1
count: 2
我确信这是正则表达式的一个问题,但我无法理解它在哪里。我调试了应用程序并进行了检查,它进入while(m.find())两次,生成string=""
,意味着它找到了正则表达式的多个匹配项。
但是如何解决这个问题,我只想用正则表达式提取SAMPLE NEWS - I am some text here and there
。