Question

所以我正在尝试下载此页面http://www.csfd.cz/film/895-28-dni-pote/prehled/。我正在使用此代码：

    URL url = new URL("http://www.csfd.cz/film/895-28-dni-pote/prehled/");
        try(BufferedReader br = new BufferedReader(new InputStreamReader(url.openStream(),Charset.forName("UTF-8")))){
            String line = br.readLine();
            while(line != null){
                System.out.println(line);
                line = br.readLine();
 }

它在其他一些页面上工作，但现在它给了我一些奇怪的符号。例如，我得到的第二行是：“ \ ？ c n ”。（它没有像我在eclipse控制台中看到的那样完全复制。）

我认为我正在使用UTF-8编码。万一你想知道它是捷克语。谢谢你的帮助。

Answer 1

$ curl -D- http://www.csfd.cz/film/895-28-dni-pote/prehled/
HTTP/1.1 200 OK
Server: nginx
Date: Mon, 01 Feb 2016 08:11:36 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: close
X-Frame-Options: SAMEORIGIN
X-Powered-By: Nette Framework
Vary: X-Requested-With
X-From-Cache: TRUE
Content-Encoding: gzip`

▒}I▒▒▒▒^▒▒29B▒▒▒$R▒M▒$nER▒▒4X, @
etc....

注意Content-Encoding: gzip - 使用gzip压缩内容，您需要对其进行解压缩才能使用它。

研究java.util.zip中的类，尤其是GzipInputStream，我相信你可以将常规输入流包裹起来。

试图通过JAVA中的url下载html页面。得到一些奇怪的符号

1 个答案: