Question

我正在读取一个类路径资源的文件：

URL dictionary = Main.class.getResource("/british-english.txt");
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(dictionary.openStream(), StandardCharsets.UTF_8));
List<String> lines = bufferedReader.lines().collect(Collectors.toList());

我应该如何处理使用不同字符集编码文件的情况，比如UTF_16？有没有办法检测到这一点，除了查看字符串列表，看看它们是否是英文单词？

Answer 1

https://tika.apache.org/0.8/api/org/apache/tika/parser/txt/CharsetDetector.html。请尝试使用apache tika api在提供的输入上进行字符集检测。

从URL资源读取字符串时，处理错误字符集的正确方法是什么？

1 个答案: