在Java中从Internet读取UTF-8编码的文本文件

时间:2012-08-01 12:21:02

标签: java utf-8

我想从互联网上读取一个xml文件。你可以找到它here 问题是它是用UTF-8编码的,我需要将它存储到一个文件中,以便以后解析它。我已经阅读了很多关于这方面的主题,这就是我想出的:

BufferedReader in;
String readLine;
try
{
    in = new BufferedReader(new InputStreamReader(url.openStream(), "UTF-8"));
    BufferedWriter out = new BufferedWriter(new FileWriter(file));

    while ((readLine = in.readLine()) != null)
        out.write(readLine+"\n");

    out.close();
}

catch (UnsupportedEncodingException e)
{
    e.printStackTrace();
}

catch (IOException e)
{
    e.printStackTrace();
}

此代码可用到此行:<title>Chérie FM</title>
当我调试时,我得到了这个:<title>Ch�rie FM</title>

显然,有些东西我无法理解,但在我看来,我在几个网站上都遵循了代码。

2 个答案:

答案 0 :(得分:8)

此文件未编码为UTF-8,而是ISO-8859-1

将您的代码更改为:

BufferedReader in;
String readLine;
try
{
    in = new BufferedReader(new InputStreamReader(url.openStream(), "ISO-8859-1"));
    BufferedWriter out = new BufferedWriter(new OutputStreamWriter( new FileOutputStream(file) , "UTF-8"));

    while ((readLine = in.readLine()) != null)
        out.write(readLine+"\n");
    out.flush();
    out.close();
}

catch (UnsupportedEncodingException e)
{
    e.printStackTrace();
}

catch (IOException e)
{
    e.printStackTrace();
}

你应该有预期的结果。

答案 1 :(得分:-1)

如果您需要以给定的编码编写文件,请改用FileOutputStream。

in = new BufferedReader(new InputStreamReader(url.openStream(), "UTF-8"));
FileOutputStream out = new FileOutputStream(file);

while ((readLine = in.readLine()) != null)
    write((readLine+"\n").getBytes("UTF-8"));

out.close();