场景::我想读取带有 utf-8 编码的阿拉伯数据集。每行中的每个单词都用空格分隔。
问题:当我阅读每一行时,输出为:
??????? ?? ???? ?? ???
问题::如何读取文件并打印每一行? 有关更多信息,here是我的阿拉伯数据集,并且读取数据的部分源代码类似于以下内容:
private ContextCountsImpl extractContextCounts(Map<Integer, String> phraseMap) throws IOException {
Reader reader;
reader = new InputStreamReader(new FileInputStream(inputFile), "utf-8");
BufferedReader rdr = new BufferedReader(reader);
while (rdr.ready()) {
String line = rdr.readLine();
System.out.println(line);
List<String> phrases = splitLineInPhrases(line);
//any process on this file
}
}
答案 0 :(得分:0)
我可以使用UTF-8
进行阅读,您可以这样尝试吗?
public class ReadArabic {
public static void main(String[] args) {
try {
String line;
InputStream fileInputStream = new FileInputStream("arabic.txt");
Reader reader = new InputStreamReader(fileInputStream, "UTF-8"); // leave charset out for default
BufferedReader bufferedReader = new BufferedReader(reader);
while ((line = bufferedReader.readLine()) != null) {
System.out.println(line);
}
} catch (Exception e) {
System.err.println(e.getMessage()); // handle all exceptions
}
}
}