Question

我有正则表达式＆＃34; [\ r \ n \ f] +＆＃34;查找String中包含的行数。我的代码是这样的：

pattern = Pattern.compile("[\\r\\n\\f]+")
String[] lines = pattern.split(texts);

在我的单元测试中，我有这样的示例字符串：

"\t\t\t    \r\n      \n"
"\r\n"

解析第一个字符串的结果是2，但是当它解析第二个字符串时它变为0。

我认为第二个字符串包含1行，尽管该行是＆＃34;空白＆＃34; （假设我在文本编辑器中编辑以＆＃34; \ r \ n＆＃34;开头的文件，是否应将插入符号放在第二行？）。我的正则表达式是不正确的解析行？或者我在这里遗漏了什么？

修改：

我想我会让问题更加明显：

为什么

// notice the trailing space in the string
"\r\n ".split("\r\n").length == 2 // results in 2 strings {"", " "}. So this block of text has two lines.

但

// notice there's no trailing space in the string 
"\r\n".split("\r\n").length == 0 // results in an empty array. Why "" (empty string) is not in the result and this block of text contains 0 lines?

Answer 1

来自the documentation for Pattern.split(CharSequence)：

此方法的作用就像通过调用给定输入序列和limit参数为零的双参数split方法一样。因此，尾随空字符串不包含在结果数组中。

许多人会同意这种行为容易引起混淆。您可以通过包含负限制来禁用尾随空白的删除（所有负值都执行相同的操作）：

String[] lines = pattern.split(texts, -1);

Answer 2

什么算作一条线真的取决于你的环境。引自wikipedia：

LF：Multics，Unix和类Unix系统（GNU / Linux，OS X，FreeBSD，   AIX，Xenix等），BeOS，Amiga，RISC OS等。

CR：Commodore 8位机器，Acorn BBC，ZX Spectrum，TRS-80，Apple   II系列，Mac OS最高版本9和OS-9
     RS：QNX pre-POSIX实现。 0x9B：使用Atari 8位机器   ATASCII ASCII变体。（十进制155）

LF + CR：Acorn BBC和RISC OS假脱机文本输出。

CR + LF：Microsoft Windows，DEC TOPS-10，RT-11和其他大多数早期版本   非Unix和非IBM操作系统，CP / M，MP / M，DOS（MS-DOS，PC DOS等），   Atari TOS，OS / 2，Symbian OS，Palm OS，Amstrad CPC

也许你应该尝试一种中立的方法：

    String test = "\t\t\t    \r\n      \n";
    BufferedReader reader = new BufferedReader(new StringReader(test));
    int count = 0;
    String line=null;
    while ((line=reader.readLine()) != null) {
        System.out.println(++count+":"+line);
    }
    System.out.println("total lines == "+count);

编辑包括Alan Moore关于使用.ready()

的说明

为什么＆＃34; \ r \ n＆＃34; .split（＆＃34; \ r \ n＆＃34;）会返回一个空数组？

2 个答案: