Java android从网站保存HTML

时间:2015-12-20 17:16:24

标签: java android

在java中,我将使用它来下载html:

   static   public  String savePage(final String URL) throws IOException {
    String line = "", all = "";
    java.net.URL myUrl = null;
    BufferedReader in = null;
    try {
        myUrl = new URL(URL);
        in = new BufferedReader(new InputStreamReader(myUrl.openStream()));

        while ((line = in.readLine()) != null) {
            all += line;

        }
    } finally {
        if (in != null) {
            in.close();
        }
    }

    return all;
}

我在普通java中使用此代码获得的HTML正是我所需要的。但是,当我尝试在Android Java(Android工作室)中使用此代码时,生成的HTML不完整,并不是我需要的。我想要的只是HTML在实际链接上的确切方式。

这是我在Android Java中下载HTML时的样子:

<!DOCTYPE html><html lang="en-GB">  <head id="head">    <style
name="www-roboto">@font-face{font-family:'Roboto';font-style:italic;font-weight:400;src:url(//fonts.gstatic.com/s/roboto/v15/W4wDsBUluyw0tK3tykhXEXYhjbSpvc47ee6xR_80Hnw.ttf)format('truetype');}@font-face{font-family:'Roboto';font-style:normal;font-weight:400;src:url(//fonts.gstatic.com/s/roboto/v15/QHD8zigcbDB8aPfIoaupKOvvDin1pK8aKteLpeZ5c0A.ttf)format('truetype');}@font-face{font-family:'Roboto';font-style:normal;font-weight:500;src:url(//fonts.gstatic.com/s/roboto/v15/RxZJdnzeo3R5zSexge8UUSZ2oysoEQEeKwjgmXLRnTc.ttf)format('truetype');}@font-face{font-family:'Roboto';font-style:italic;font-weight:500;src:url(//fonts.gstatic.com/s/roboto/v15/OLffGBTaF0XFOW1gnuHF0SwlidHJgAgmTjOEEzwu1L8.ttf)format('truetype');}</style><script
name="www-roboto">if (document.fonts && document.fonts.load) {document.fonts.load("400 10pt Roboto", "E");document.fonts.load("500 10pt Roboto", "E");}</script>      <script>var ytcsi = {gt: function(n) {n = (n || '') + 'data_';return ytcsi[n] || (ytcsi[n] = {tick: {},span: {},info: {}});},tick: function(l, t, n) {ytcsi.gt(n).tick[l] = t || +new Date();},span: function(l, s, e, n) {ytcsi.gt(n).span[l] = (e ? e : +new Date()) - ytcsi.gt(n).tick[s];},setSpan: function(l, s, n) {ytcsi.gt(n).span[l]
= s;},info: function(k, v, n) {ytcsi.gt(n).info[k] = v;},setStart: function(s, t, n) {ytcsi.info('yt_sts', s, n);ytcsi.tick('_start', t, n);}};(function(w, d) {ytcsi.perf = w.performance || w.mozPerformance ||w.msPerformance || w.webkitPerformance;ytcsi.setStart('dhs', ytcsi.perf ? ytcsi.perf.timing.responseStart : null);var isPrerender = (d.visibilityState || d.webkitVisibilityState) == 'prerender';var vName = d.webkitVisibilityState ? 'webkitvisibilitychange' : 'visibilitychange';if (isPrerender) {ytcsi.info('prerender', 1);var startTick = function() {ytcsi.setStart('dhs');d.removeEventListener(vName, startTick);};d.addEventListener(vName, startTick, false);}if (d.addEventListener) {d.addEventListener(vName, function() {ytcsi.tick('vc');}, false);}})(window, document);</script>    <script>if (window.ytcsi) {window.ytcsi.tick("_start", null, 'initpb');}</script>    <script>if (window.ytcsi) {window.ytcsi.tick("_start", null, 'blz_watch_ads');}</script>    <script>if (window.ytcsi) {window.ytcsi.tick("_start", null, 'blz_home_ads');}</script>    <script>if (window.ytcsi) {window.ytcsi.tick("_start", null, 'blz_search_ads');}</script>      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">    <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no, target-densityDpi=medium-dpi">  <link rel="icon" href="//s.ytimg.com/yts/favicon-vflz7uhzw.ico" type="image/x-icon">  <link rel="shortcut icon" href="//s.ytimg.com/yts/favicon-vflz7uhzw.ico" type="image/x-icon">   <title>YouTube</title>  <link rel="stylesheet" href="//s.ytimg.com/yts/cssbin/mobile-nirvana-tablet-mangled-vflylHmeV.css" id="page_css">  </head>  <body id="body" class="atom fusion-tn">       <script>      var original_url = encodeURIComponent(encodeURIComponent(encodeURIComponent(document.location.href))); var iframe_url = 'https://accounts.google.com/ServiceLogin?continue=http%3A%2F%2Fwww.youtube.com%2Fsignin%3Fnext%3Dhttp%253A%252F%252Fm.youtube.com%252Fsignin_passive%253Foriginal_url%253DORIGINAL_URL_PLACE_HOLDER%26hl%3Den-GB%26feature%3Dmobile_passive%26app%3Dm%26action_handle_signin%3Dtrue&amp;hl=en-GB&amp;passive=true&amp;service=youtube&amp;uilel=3'.replace('ORIGINAL_URL_PLACE_HOLDER', original_url);      document.write('<iframe src=\"' + iframe_url + '\" style=\"width:0;height:0;margin:0;border-width:0;padding:0;position:absolute;\"></iframe>'); </script>  <div id="player"></div>  <div id="guide-layout-container">  <div id="guide-container"></div>    <div id="content-container">      <div id="content"></div>    </div>    <div id="guide-overlay"></div>   <div id="lightbox"></div>    <div id="toast"></div>    <div id="content-overlay"></div>  </div>  <div id="_yt_orientation_de

这个HTML与网站完全不同,我试图从中下载它。我已经尝试了很多不同的方法来从网站下载HTML,所有这些都给我不完整和随机的HTML这样的。

我曾尝试对URL进行编码,并使用了可用于下载HTML但仍然没有运气的库。

对此的解释以及甚至可能做我想要的代码都将非常感激。 Android java对我来说是新的,所以很多细节都会帮助我更好地理解。

谢谢

1 个答案:

答案 0 :(得分:0)

根据评论,如果您希望网站认为您没有通过移动设备与其进行通信,则需要在网络请求中设置User-Agent。