Question

我正在读取包含以下代码的文件：

 Scanner in = new Scanner(new File(fileName));
    while (in.hasNextLine()) {
        String[] line = in.nextLine().trim().split("[ \t]");
       .
       .
       .
    }

当我使用vim打开文件时，某些行以以下特殊字符开头：

，但是Java代码无法读取这些行。当到达这些行时，它认为这是文件的末尾，并且hasNextLine（）函数返回false！

编辑：这是所提到的（问题）行的十六进制转储：

0000000：e280 9c20 302e 3230 3133 3220 302e 3231 ... 0.20132 0.21 0000010：3431 392d 302e 3034 0a 419-0.04。

Answer 1

@VGR正确。

tl; dr：使用Scanner in = new Scanner(new File(fileName), "ISO-8859-1");

正在发生的事情是：

由于该孤独的0x9C字符，您的文件不是有效的UTF-8。
由于这是系统默认设置，因此扫描程序正在以UTF-8格式读取文件
基础库抛出MalformedInputException
扫描程序捕获并隐藏了它（一个很好的含义，但设计决策误导了）
它开始报告没有行了
除非您真正询问扫描仪，否则您不会知道任何问题

这是MCVE：

import java.io.*;
import java.util.*;

class Test {
  public static void main(String[] args) throws Exception {
    Scanner in = new Scanner(new File(args[0]), args[1]);
    while (in.hasNextLine()) {
      String line = in.nextLine();
      System.out.println("Line: " + line);
    }
    System.out.println("Exception if any: " + in.ioException());
  }
}

这是正常调用的示例：

$ printf 'Hello\nWorld\n' > myfile && java Test myfile UTF-8
Line: Hello
Line: World
Exception if any: null

这就是您所看到的（除非您不检索并显示隐藏的异常）。请特别注意，未显示任何行：

$ printf 'Hello\nWorld \234\n' > myfile && java Test myfile UTF-8
Exception if any: java.nio.charset.MalformedInputException: Input length = 1

这里是当解码为ISO-8859-1时，所有字节序列都是有效的解码（即使0x9C没有分配的字符，因此也不会出现在终端中）：

$ printf 'Hello\nWorld \234\n' > myfile && java Test myfile ISO-8859-1
Line: Hello
Line: World
Exception if any: null

如果您仅对ASCII数据感兴趣，并且没有任何UTF-8字符串，只需将扫描器作为第二个参数传递给ISO-8859-1，即可要求扫描程序使用Scanner。构造函数：

Scanner in = new Scanner(new File(fileName), "ISO-8859-1");

Java无法从文件读取行

1 个答案: