使用UTF-8标识符

时间:2015-06-05 09:40:17

标签: java android encoding utf-8

我从HTTP请求获得字符串流。 Stream看起来像:

<?xml version="1.0" encoding="utf-8"?>

前三个标记表示String被编码为UTF-8。

我正在使用String制作文件。在阅读它们时,我收到一个错误:

使用此方法我正在使用该字符串制作文件:

private void writeToFile(String data, String fileName) {
    try {
        String UTF8 = "UTF-8";
        int BUFFER_SIZE = 8192;

        String xmlCut = data.substring(3);

        File sdCard = Environment.getExternalStorageDirectory();
        File dir = new File (sdCard.getAbsolutePath()+"/example/Test");
        dir.mkdirs();
        File file = new File(dir,fileName);

        FileOutputStream f = new FileOutputStream(file);
        FileOutputStream fileOutputStream = openFileOutput(fileName, Context.MODE_PRIVATE);
        BufferedWriter bufferedWriter = new BufferedWriter(new OutputStreamWriter(fileOutputStream,UTF8),BUFFER_SIZE);
        bufferedWriter.write(String.valueOf(data.getBytes("UTF-8")));
        f.write(data.getBytes("UTF-8"));
        f.close();
        bufferedWriter.close();
    } catch (IOException e) {
        Log.e("writeToFile: ", "Datei-Erstellung fehlgeschlagen: " + e.toString());
    }

}

正如您所看到的,我添加了substring方法来删除前三个令牌,因为这会导致崩溃。问题是然后文件在ASCI中编码。

读取文件的方法:

 private String readFromFile(String fileName) {
    String ret = "";
    String UTF8 = "UTF-8";
    int BUFFER_SIZE = 8192;

    try {
        InputStream inputStream = openFileInput(fileName);

        if (inputStream != null) {


            BufferedReader bufferedReader1 = new BufferedReader(new InputStreamReader(inputStream,UTF8),BUFFER_SIZE);
            String receiveString = "";
            StringBuilder stringBuilder = new StringBuilder();

            while ((receiveString = bufferedReader1.readLine()) != null) {
                stringBuilder.append(receiveString);
            }

            inputStream.close();
            ret = stringBuilder.toString();
        }
    } catch (FileNotFoundException e) {
        Log.e("readFromFile: ", "Datei nicht gefunden: " + e.toString());
    } catch (IOException e) {
        Log.e("readFromFile: ", "Kann Datei nicht lesen: " + e.toString());
    }
    return ret;
}

如果我没有削减UTF-8令牌,那么我从stacktrace中得到这个错误:

Caused by: java.lang.NullPointerException: Attempt to invoke interface method 'org.w3c.dom.NodeList org.w3c.dom.Document.getElementsByTagName(java.lang.String)' on a null object reference
        at de.example.app.ListViewActivity.setListProjectData(ListViewActivity.java:226)

就在这里:

public void setListProjectData(String filename) {

    XMLParser parser = new XMLParser();
    String xmlData = readFromFile(filename);
    String xmlCut = xmlData.substring(3);
    Document doc = parser.getDomElement(filename);

    NodeList nodeListProject = doc.getElementsByTagName(KEY_PROJECT);


    for (int i = 0; i < nodeListProject.getLength(); i++) {

        HashMap<String, String> map = new HashMap<String, String>();
        Element e = (Element) nodeListProject.item(i);

        map.put(KEY_UUID, parser.getValue(e, KEY_UUID));
        map.put(KEY_NAME, parser.getValue(e, KEY_NAME));
        map.put(KEY_JOBTITLE, parser.getValue(e, KEY_JOBTITLE));
        map.put(KEY_JOBINFO, parser.getValue(e, KEY_JOBINFO));
        map.put(KEY_PROJECTIMAGE, parser.getValue(e, KEY_PROJECTIMAGE));


        projectItems.add(map);
    }
}

我从这里获取HTTP数据:

public String getXMLFromUrl(String url) {
    String xml = null;

    if (cd.isConnectingToInternet()) {
        try {
            //defaultHttpClient
            DefaultHttpClient httpClient = new DefaultHttpClient();
            HttpPost httpPost = new HttpPost(url);

            HttpResponse httpResponse = httpClient.execute(httpPost);
            HttpEntity httpEntity = httpResponse.getEntity();
            /*
            final InputStream in = httpEntity.getContent();
            Reader reader = new InputStreamReader(in,"UTF-8");
            InputSource is = new InputSource(reader);
            is.setEncoding("UTF-8");

* /                 xml = EntityUtils.toString(httpEntity);

        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();
        } catch (ClientProtocolException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    } else {
        return null;
    }

    return xml;

那么,我如何将它们编码为UTF-8?我做得对吗?

1 个答案:

答案 0 :(得分:0)

您的问题不在您发布的代码中,而是在从HTTP请求获取数据的代码中。

您正在将String data传递给writeToFile方法。 Java中的字符串是UTF-16编码的。如果您在该字符串中有UTF-8个编码数据,则无法进一步编码解码来修复已损坏的数据。

您应该使用xml = EntityUtils.toString(httpEntity, HTTP.UTF_8)正确解码数据。

如果返回的数据包含UTF-8 BOM,则还有其他问题。上面的行将正确解码数据,但它会留下多余的(和错误的)BOM

要解决服务器必须在没有BOM的情况下返回数据,或者必须删除BOM。为了这样做,可以使用代码(或类似的代码)

public static String stripBOM(InputStream stream)
{
    try
    {
        byte[] buffer = new byte[1024];
        ByteArrayOutputStream os = new ByteArrayOutputStream(1024);
        byte[] bom = new byte[3];
        stream.read(bom);
        int bytesRead;
        while ((bytesRead = stream.read(buffer)) != -1)
        {
            os.write(buffer, 0, bytesRead);
        }
        os.close();
        return os.toString("UTF-8");
    }
    catch (IOException e)
    {
        return "";
    }
}

所以xml = EntityUtils.toString(httpEntity, HTTP.UTF_8)可以替换为

 InputStream is = httpEntity.getContent();
 xml = stripBOM(is);