从Java输入流中读取下一个字符(完整的unicode代码点)

时间:2014-10-15 20:44:36

标签: java utf-8

我需要逐字符地解析UTF-8输入(从文本文件中)(并且字符是指完整的UTF-8字符(UTF-8代码点),而不是Java的字符)。

我应该使用什么方法?

2 个答案:

答案 0 :(得分:1)

自Java 8以来CharSequence.codePoints()

例如:

// if you want to work line by line, use Files.readAllLines()
// if you use Guava, there's also Guava's Files.toString() for reading the whole file into a String
byte[] bytes = Files.readAllBytes(Paths.get("test.txt"));
String text = new String(bytes, StandardCharsets.UTF_8);

IntStream codePoints = text.codePoints();

// do something with the code points
codePoints.forEach(codePoint -> System.out.println(codePoint));

答案 1 :(得分:-2)

您可以使用read()方法使用InputStreamReader轻松完成此操作。 read方法将返回一个int,它是一个代码点。点击此处了解更多信息:http://docs.oracle.com/javase/tutorial/i18n/text/stream.html

FileInputStream fis = new FileInputStream("test.txt");
InputStreamReader isr = new InputStreamReader(fis, "UTF8");
//Use isr.read() to read character by character.