Question

public static void main(String[] args) {
        // TODO code application logic here
        URL url;

        try {
            // get URL content
            url = new URL("http://mp3.zing.vn/album/Chuyen-Tinh-Nha-Tho-Single-Van-Mai-Huong/ZWZAWZAZ.html");
            URLConnection conn = url.openConnection();

            // open the stream and put it into BufferedReader
            BufferedReader br = new BufferedReader(
                               new InputStreamReader(conn.getInputStream()));

            String inputLine;

            //save to this filename
            String fileName = "G:\\test1.txt";
            File file = new File(fileName);

            if (!file.exists()) {
                file.createNewFile();
            }

            //use FileWriter to write file
            FileWriter fw = new FileWriter(file.getAbsoluteFile());
            BufferedWriter bw = new BufferedWriter(fw);

            while ((inputLine = br.readLine()) != null) {
                bw.write(inputLine);
            }

            bw.close();
            br.close();

            System.out.println("Done");

        } catch (MalformedURLException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }

    }

当我在netbean上运行test1.txt中的文本时，此代码正常。当我在日食上运行时结果：

�     
�{�ב'�w��*_��f���b��/�%uu����DY[��̪JvUf�2�̬p1�;�w��w�;�}}yV��`0$�}�'���sN>��/���^A��<�8q"�ĉ��뛻�����?�k{�ַ�Z"��<(ld2��M���ƶ�Kg�d~&����=�.g2�����B�u3���  ���j�k��:i�7���-��d�w��-�j���H�n�,ݤ/��o�}ku�7>}��o�y�?����;���}�x`;ݾuCKi����������w�|�t�'�Z�=h�|V뻞׷<�VF4H��X��Ô���>ZIl��o9~�y:��!~�$|�����2z�ȳ�{�۩jB�0��GX

有人请帮忙解决这个问题，谢谢!!!!

Answer 1

您正在检索的网站正在使用您未正确处理的编码。查看该站点简要显示它是使用UTF-8编码的，因此您需要在读取数据时考虑到这一点。 InputStreamReader在其构造函数中提供了一个选项。

BufferedReader br = new BufferedReader(
                           new InputStreamReader(conn.getInputStream(), "UTF8"));

经过一些测试后我确认在我的机器上，你的代码实际上工作得很好，因为我的默认编码是UTF-8（系统默认的字符集会被使用，如果你没有指定一个）。这可能适用于您，也可能不适用于您;尝试打印出编码以查看您正在阅读的内容：

System.out.println(new InputStreamReader(conn.getInputStream()).getEncoding());
// prints "UTF8" on my machine.

指定字符集始终仍然是最佳，以使您的代码独立于平台默认值。

如果上面的打印输出显示UTF8，或者在指定了字符集后仍然看到意外的结果，则问题可能出在您用来查看输出文件的编辑器上。确保你的文本编辑器可以处理UTF8，你应该很高兴。我在SublimeText 3中看到了这个：

!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"> <head>              <title>Chuyện Tình Nhà Thơ (Single) - Văn Mai Hương | Album 320 lossless</title>        <meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
...

当我从网址获取HTML时，我的文字很糟糕

1 个答案: