Question

我正在尝试阅读包含英文版的文件。每行的阿拉伯字符和另一个包含英文＆amp;每行中文字符。然而，阿拉伯文和中文的字符无法正确显示 - 它们只是作为问号出现。知道如何解决这个问题吗？

以下是我用于阅读的代码：

try {
        String sCurrentLine;
        BufferedReader br = new BufferedReader(new FileReader(directionOfTargetFile));
        int counter = 0;

        while ((sCurrentLine = br.readLine()) != null) {
            String lineFixedHolder = converter.fixParsedParagraph(sCurrentLine);
            System.out.println("The line number "+ counter
                               + " contain : " + sCurrentLine);
            counter++;
        }
    }

第01版

在读完该行并获得阿拉伯语和中文单词后，我使用一个函数来翻译它们，只需在ArrayList（包含所有预期单词）中搜索 Given Arabic Text （使用indexOf（） ; 方法）。然后当找到单词的索引时，它用于调用在另一个Arraylist中具有相同索引的英语单词。但是，此搜索始终返回false，因为它在搜索问号而不是阿拉伯语和中文字符时失败。所以我的System.out.println打印显示空值，每个转换失败一个。

*我正在使用Netbeans 6.8 Mac版IDE

第02版

以下是搜索翻译的代码：

        int testColor = dbColorArb.indexOf(wordToTranslate);
        int testBrand = -1;
        if ( testColor != -1 ) {
            String result = (String)dbColorEng.get(testColor);
            return result;
        } else {
            testBrand = dbBrandArb.indexOf(wordToTranslate);
        }
        //System.out.println ("The testBrand is : " + testBrand);
        if ( testBrand != -1 ) {
            String result = (String)dbBrandEng.get(testBrand);
            return result;
        } else {
            //System.out.println ("The first null");
            return null;
        }

我实际上正在搜索可能包含要翻译的所需单词的2个Arraylists。如果它无法在两个ArrayLists中找到它们，则返回null。

第03版

当我调试时，我发现正在读取的行存储在我的String变量中，如下所示：

 "3;0000000000;0000001001;1996-06-22;;2010-01-27;����;;01989;������;"

第03版

我正在阅读的文件是在被另一个程序修改后给我的（我在VB旁边一无所知）该程序使得阿拉伯字母看起来不正确。当我在Notepad ++上检查文件的编码时，它显示它是ANSI。然而，当我将其转换为UTF8（用其他英文字母替换阿拉伯字母）然后将其转换回ANSI时，阿拉伯语成为问号！

Answer 1

FileReader javadoc：

读取字符文件的便捷类。此类的构造函数假定默认字符编码和默认字节缓冲区大小是适当的。要自己指定这些值，请在FileInputStream上构造一个InputStreamReader。

所以：

Reader reader = new InputStreamReader(new FileInputStream(fileName), "utf-8");
BufferedReader br = new BufferedReader(reader);

如果仍然无效，那么您的控制台可能未设置为正确显示UTF-8字符。配置取决于所使用的IDE，而且非常简单。

更新：在上面的代码中，将utf-8替换为cp1256。这对我来说很好（WinXP，JDK6）

但我建议您坚持使用UTF-8生成文件。因为cp1256对中文不起作用，你会再遇到类似的问题。

Answer 2

IT很可能正确地读取信息，但是您的输出流可能不是UTF-8，因此输出字符集中无法显示的任何字符都将替换为“？”。

您可以通过获取每个角色并打印角色序号来确认这一点。

Answer 3

public void writeTiFile(String fileName,String str){
    try {
        FileOutputStream out = new FileOutputStream(fileName);
        out.write(str.getBytes("windows-1256"));
    } catch (Exception ex) {
        ex.printStackTrace();
    }
}

为什么Java BufferedReader（）没有正确读取阿拉伯语和中文字符？

3 个答案: