我需要逐字符地解析UTF-8输入(从文本文件中)(并且字符是指完整的UTF-8字符(UTF-8代码点),而不是Java的字符)。
我应该使用什么方法?
答案 0 :(得分:1)
自Java 8以来CharSequence.codePoints()
例如:
// if you want to work line by line, use Files.readAllLines()
// if you use Guava, there's also Guava's Files.toString() for reading the whole file into a String
byte[] bytes = Files.readAllBytes(Paths.get("test.txt"));
String text = new String(bytes, StandardCharsets.UTF_8);
IntStream codePoints = text.codePoints();
// do something with the code points
codePoints.forEach(codePoint -> System.out.println(codePoint));
答案 1 :(得分:-2)
您可以使用read()方法使用InputStreamReader轻松完成此操作。 read方法将返回一个int,它是一个代码点。点击此处了解更多信息:http://docs.oracle.com/javase/tutorial/i18n/text/stream.html
FileInputStream fis = new FileInputStream("test.txt");
InputStreamReader isr = new InputStreamReader(fis, "UTF8");
//Use isr.read() to read character by character.