使用HttpGet检索的HTML代码不会返回整页

时间:2014-07-23 16:37:09

标签: java android html jsoup http-get

尝试从本网站提取所有HTML代码时http://www.gasbuddy.com/GB_Price_List.aspx 我遇到的问题是只有一半的页面被放入我的String中。

我尝试过在SO上找到的几种方法,以及Google搜索的其他来源,但没有一种能够解决我的问题。

这是我检索页面的代码:

private class InternetGasBuddyConnection extends AsyncTask<String, String, String> {

    protected String doInBackground(String... urls) {
        StringBuilder response = new StringBuilder(30000);
        DefaultHttpClient client = new DefaultHttpClient();
        HttpGet httpGet = new HttpGet(URL);
        String result = "";
        try {
            HttpResponse execute = client.execute(httpGet);
            InputStream content = execute.getEntity().getContent();

            BufferedReader buffer = new BufferedReader(new InputStreamReader(content));
            String s = "";
            while ((s = buffer.readLine()) != null)
                response.append(s);

            Log.d("before changing and parsing", response.toString());

            Document doc = Jsoup.parse(response.toString(), URL);

            result = response.toString();
            Log.d("no parsing", result.toString());

            result = doc.toString();        


            Log.d("after parsing", result);

        } catch (Exception e) {

            Log.e("Darrell", result, e);
            e.printStackTrace();
        }
        return result;
    }
    @Override
    public void onPostExecute(final String result) {
        Log.d("onPostExcecute()", result);
        htmlDoc = result;
    }
}

当我调用Log.d("after parsing", result);代码时,它会在我的logcat中显示:

07-23 13:19:57.833: D/after parsing(32136): <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
07-23 13:19:57.833: D/after parsing(32136): <html xmlns="http://www.w3.org/1999/xhtml">
07-23 13:19:57.833: D/after parsing(32136):  <head> 
07-23 13:19:57.833: D/after parsing(32136):   <title>USA and Canada Current Average Gas Prices By City/State/Province - GasBuddy.com</title> 
07-23 13:19:57.833: D/after parsing(32136):   <base id="ctl00_head_base" href="http://www.gasbuddy.com/" /> 
07-23 13:19:57.833: D/after parsing(32136):   <script type="text/javascript" src="/js/menu_v3.js?q=11"></script> 
07-23 13:19:57.833: D/after parsing(32136):   <link href="/Style.css" rel="Stylesheet" /> 
07-23 13:19:57.833: D/after parsing(32136):   <link href="/css/main.css?q=13" rel="Stylesheet" /> 
07-23 13:19:57.833: D/after parsing(32136):   <meta http-equiv="pragma" content="no-cache" /> 
07-23 13:19:57.833: D/after parsing(32136):   <link rel="shortcut icon" href="/favicon.ico" /> 
07-23 13:19:57.833: D/after parsing(32136):   <!--[if lt IE 7]>        <link href="/css/main_ie6.css?q=11" rel="Stylesheet" />        <![endif]--> 
07-23 13:19:57.833: D/after parsing(32136):   <!-- PUT THIS TAG IN THE head SECTION -->
07-23 13:19:57.833: D/after parsing(32136):   <script type="text/javascript" src="http://partner.googleadservices.com/gampad/google_service.js"></script>
07-23 13:19:57.833: D/after parsing(32136):   <script type="text/javascript">  GS_googleAddAdSenseService("ca-pub-9634286501775085");  GS_googleEnableAllServices();</script>
07-23 13:19:57.833: D/after parsing(32136):   <script type="text/javascript">    var site;    var siteleft;    var siteright;    var site_length;    site="GasBuddy".toLowerCase();    site_length=site.length;    siteleft=site.substring(0,4);    siteright=site.substring(site_length-4,site_length);    site = siteleft + siteright;    GA_googleAddAttr("GasPri_URL", site);    </script>
07-23 13:19:57.833: D/after parsing(32136):   <script type="text/javascript">  GA_googleAddSlot("ca-pub-9634286501775085", "GasBuddy_Content_Top_728x90");  GA_googleAddSlot("ca-pub-9634286501775085", "GasBuddy_Content_Top_160x600");  GA_googleAddSlot("ca-pub-9634286501775085", "GasBuddy_Content_160x600_Bottom");</script>
07-23 13:19:57.833: D/after parsing(32136):   <script type="text/javascript">    GA_googleFetchAds();</script>
07-23 13:19:57.833: D/after parsing(32136):   <!-- END OF TAG FOR head SECTION -->
07-23 13:19:57.833: D/after parsing(32136):   <script type="text/javascript"> function getElementPosition(offsetTrail){    var offsetLeft = 0;    var offsetTop = 0;    while (offsetTrail){        offsetLeft += offsetTrail.offsetLeft;        offsetTop += offsetTrail.offsetTop;        offsetTrail = offsetTrail.offsetParent;    }    if (navigator.userAgent.indexOf('Mac') != -1 && typeof document.body.leftMargin != 'undefined'){        offsetLeft += document.body.leftMargin;        offsetTop += document.body.topMargin;    }    return {left:offsetLeft,top:offsetTop};}var ad_containers = [['divSkyscraper', 'divSky'], ['divLeaderboard','div728']];function moveAd() {  var i = 0;  for (i=0; i<ad_containers.length; i++){         if (document.getElementById(ad_containers[i][0])){         document.getElementById(ad_containers[i][0]).style.display = 'block';            document.getElementById(ad_containers[i][0]).style.position='absolute';            document.getElementById(ad_containers[i][0]).style.top=getElementPosition(document.getElementById(ad_containers[i][1])).top+"px";            document.getElementById(ad_containers[i][0]).style.left=getElementPosition(document.getElementById(ad_containers[i][1])).left+"px";         }    }        }    </script> 
07-23 13:19:57.833: D/after parsing(32136):   <script type="text/javascript">window._addWindowOnResize = function (func){if (typeof window.onresize == 'function'){var oldFunc = window.onresize;window.onresize = function() { oldFunc(); func(); }}else{window.onresize = func;}}</script> 
07-23 13:19:57.833: D/after parsing(32136):  </head> 
07-23 13:19:57.833: D/after parsing(32136):  <body> 
07-23 13:19:57.833: D/after parsing(32136):   <input id="adcoord" type="hidden" value="" /> 
07-23 13:19:57.833: D/after parsing(32136):   <input name="ctl00$serveradcoord" type="hidden" id="ctl00_serveradcoord" /> 
07-23 13:19:57.833: D/after parsing(32136):   <script type="text/javascript">        document.getElementById('adcoord').value= document.getElementById('ctl00_serveradcoord').value;    </script> 
07-23 13:19:57.833: D/after parsing(32136):   <form name="aspnetForm" method="post" action="GB_Price_List.aspx" id="aspnetForm">
07-23 13:19:57.833: D/after parsing(32136):    <input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwUJODA2MjYxNTkwD2QWAmYPZBYEAgIPFgIeBGhyZWYFGGh0dHA6Ly93d3cuZ2FzYnVkZHkuY29tL2QCBQ9kFgRmDxYCHgRUZXh0BXk8aW5wdXQgaWQ9ImFkY29vcmQiIHR5cGU9ImhpZGRlbiIgdmFsdWU9ImxhdD00NC45NzIwODQ5MTYxMDUmYW1wO2xuZz0tOTMuMjU1Mzg2MzUyNTM5JmFtcDtydD1zY3JpcHQmYW1wO2NiPTQzMDA0MDc2OD4iIC8+ZAIJDxYCHwFlZGQFCm9rVVAnH00/+plv75hl5Bjosg==" />
07-23 13:19:57.833: D/after parsing(32136):    <input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEWAgLon7DQAgL9t/yMB9Wx1nzDGAm5Ha

正如您所看到的,该行不是html结束标记</html>为什么它不是全部放在字符串中?

HTML页面的其余部分如下(其余部分,我已经达到了身体限制......):

</div> 

<div> 

<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEWAgK6gqnzAwL9t/yMBxUlbb+cRAiGT6aQPAFgtAU0IQZv" /> 
</div> 
<input id="adcoord" type="hidden" value="lat=44.972084916105&amp;lng=-93.255386352539&amp;rt=script&amp;cb=257130208>" /> 
<div id="main_wrapper" > 


<style type="text/css"> 
a.social { 
background-image: url(/images/art/social_small_sp.png); 
background-repeat: no-repeat; 
padding: 3px 0px 3px 25px; 
margin: 0px 10px 0px 0px; 
text-decoration: underline; 
} 

a.social:hover { 
text-decoration: underline; 
} 

a.fb { background-position: 0px 2px;} 
a.tw { background-position: 0px -35px;} 

</style> 
<div style="font-size: 12px; height: 21px;"> 
<div style="float: left; padding-top: 3px;"> 
<a id="ctl00_GBTP_HyperLink1" href="Registration.aspx">[Become A Member]</a>&nbsp; 

<a id="ctl00_GBTP_HyperLink2" href="GB_Mem_log_in.aspx">[Log In]</a> 

</div> 
<div style="float: right; padding-top: 3px;"> 
<b>Follow Us</b>&nbsp;&nbsp;&nbsp; 
<a href="http://www.facebook.com/gasbuddy" target="_blank" class="social fb">Facebook</a> 

<a href="http://twitter.com/gasbuddy" target="_blank" class="social tw">Twitter</a> 

</div> 
</div> 


<style type="text/css"> 

td.gb_h_search { 
width: 240px; 
vertical-align: bottom; 
padding-bottom: 5px; 
font-size: 0px; 
} 

td.gb_h_search span { 
font-weight: bold; 
color: #555555; 
font-size: 17px; 
} 

td.gb_h_search div { 
margin-top: 2px; 
} 

td.gb_h_class { 
vertical-align: bottom; 
padding-bottom: 5px; 
} 

td.gb_h_class a { 
font-weight: bold; 
padding-left: 20px; 
} 

</style> 


<div id="header" onkeydown="return txtSearch_click(event);"> 
<table cellspacing="0" cellpadding="0" border="0" style="width: 968px"> 
<tr> 
<td valign="top" width="425px" style=""> 
<a href="http://www.GasBuddy.com/"><img id="imgHeadbar" alt="" src="../images/logos/gasbuddy_logo.gif" width="425" height="58" /></a> 

</td> 
<td class="gb_h_search"> 

</td> 
<td class="gb_h_class"> 
<div> 

</div> 
</td> 
</tr> 
</table> 
</div> 



<style type="text/css"> 
#s_n_home a { 
background-image: url(/images/menu/tp_sp.png); 
background-position: 1px 1px; 
background-repeat: no-repeat; 
vertical-align: bottom; 
padding-left: 24px; 
} 

#subnavi2 .s_n_feat { 
padding: 0 3px; 
} 

#s_n_home li.s_n_feat, #s_n_home li.s_n_feat:hover {background-position: 0px 0px;} 
#s_n_home a.s_n_feat_map {background-position: 4px 5px;} 
#s_n_home a.s_n_feat_tc {background-position: 4px -27px} 
#s_n_home a.s_n_feat_log {background-position: 3px -62px;} 
#s_n_home a.s_n_feat_chart {background-position: 4px -133px;} 
#s_n_home a.s_n_feat_prize {background-position: 4px -98px;} 
#s_n_home a.s_n_feat_tip {background-position: 3px -164px;} 
#s_n_home a.s_n_feat_blog {background-position: 3px -194px;} 
</style> 

<div id="navi2"> 
<ul> 
<li id="n_home"><a href="http://www.gasbuddy.com/">Home</a><span></span></li> 

<li id="n_blog"><a href="http://blog.gasbuddy.com/" target="_blank">Blog</a><span></span></li> 
<li id="n_gas" class="n_sel"><a href="/GB_Price_List.aspx">Gas Prices</a><span></span></li> 
<li id="n_charts"><a href="/gb_retail_price_chart.aspx?time=24">Price Charts</a><span></span></li> 
<li id="n_maps"><a href="/gb_gastemperaturemap.aspx">Gas Price Maps</a><span></span></li> 
<li id="n_points"><a href="/GB_Contest_Info.aspx?cntry=GB">Points &amp; Prizes</a><span></span></li> 
<li id="n_wireless"><a href="/GasBuddyMobileApps.aspx">Mobile Apps</a><span></span></li> 
<li id="n_media"><a href="http://media.gasbuddy.com/">Media</a><span></span></li> 

<li id="n_help"><a href="/gb_contact.aspx">Contact</a><span></span></li> 

<li id="n_advertise"><a href="/GB_AdvertiseWithUs.aspx">Advertise with us</a></li> 

</ul> 
</div> 

<div id="subnavi2"> 
<div id="s_n_home"> 
<ul> 
<li class="s_n_feat">Top Features:</li> 
<li><a href="/gb_gastemperaturemap.aspx" class="s_n_feat_map">Gas Price Heat Map</a><span></span></li> 
<li><a href="/Trip_Calculator.aspx" class="s_n_feat_tc">Trip Cost Calculator</a><span></span></li> 
<li><a href="/gb_retail_price_chart.aspx?time=24" class="s_n_feat_chart">Gas Price Charts</a><span></span></li> 
<li><a href="http://blog.gasbuddy.com/" target="_blank" class="s_n_feat_blog">GasBuddy Blog</a><span></span></li> 
<li><a href="GB_Contest_Info.aspx" class="s_n_feat_prize">Win Prizes</a><span></span></li> 
<li><a href="/GB_Fuel_Save.aspx" class="s_n_feat_tip">Fuel Saving Tips</a></li> 

</ul> 
</div> 

<div id="s_n_gas" class="s_n_on"> 
<ul> 
<li><a href="/Trip_Calculator.aspx">Trip Cost Calculator</a><span></span></li> 

<li><a href="/GB_StateList.aspx">Gas Prices by State/Province</a><span></span></li> 

<li><a href="/GB_Price_List.aspx">City &amp; State Averages</a><span></span></li> 

<li><a href="/GB_Fuel_Save.aspx">Fuel Saving Tips</a></li> 

</ul> 
</div> 

<div id="s_n_charts"> 
<ul> 
<li><a href="/gb_retail_price_chart.aspx?time=1">Past Month</a><span></span></li> 

<li><a href="/gb_retail_price_chart.aspx?time=12">Past Year</a><span></span></li> 

<li><a href="/gb_retail_price_chart.aspx?time=24">Past Two Years</a></li> 

</ul> 
</div> 

<div id="s_n_maps"> 
<ul> 
<li><a href="/GB_Map_Gas_Prices.aspx">Map Gas Prices</a><span></span></li> 

<li><a href="/gb_gastemperaturemap.aspx">Gas Price Heat Maps</a></li> 

</ul> 
</div> 

<div id="s_n_points"> 
<ul> 
<li><a href="/GB_Contest_Info.aspx?cntry=GB">Prize Give-away</a><span></span></li> 

<li><a href="/GB_Contest_Winners.aspx">Recent Winners</a><span></span></li> 

<li><a href="/GB_Choose_Site.aspx">Get Entries</a></li> 

</ul> 
</div> 

<div id="s_n_wireless"> 
<ul> 

<li><a href="/GasBuddyiPhoneApp.aspx">iPhone</a><span></span></li> 

<li><a href="/GasBuddyAndroidApp.aspx">Android</a><span></span></li> 

<li><a href="/GasBuddyWindowsPhoneApp.aspx">Windows Phone</a><span></span></li> 

<li><a href="/GasBuddyMobileApps.aspx#MobileWeb">Mobile Web</a><span></span></li> 

<li><a href="/GasBuddyBlackBerryApp.aspx">BlackBerry</a></li> 

</ul> 
</div> 

<div id="s_n_media"> 
<ul> 
<li><a href="http://media.gasbuddy.com/">Media Story Ideas</a></li> 

</ul> 
</div> 

<div id="s_n_help"> 
<ul> 
<li><a href="/gb_contact.aspx">Contact Us</a><span></span></li> 

<li><a href="http://media.gasbuddy.com/#ContactUs">Media Inquiries</a><span></span></li> 

<li><a href="/gb_aboutus.aspx">About Us</a></li> 

</ul> 
</div> 

<div id="s_n_blog"> 
<ul> 
<li><a href="http://blog.gasbuddy.com/" target="_blank">Recent Blog Posts</a></li> 

</ul> 
</div> 
<div id="s_n_advertise"> 
<ul> 
<li><a href="/GB_AdvertiseWithUs.aspx">Advertise with us</a></li> 

</ul> 
</div> 

<div id="s_n_fuel"> 
<ul> 
<li><a href="/Pricelock.aspx">Control your business' fuel costs</a><span></span></li> 

<li><a href="/PricelockHowItWorks.aspx">Get paid when fuel prices increase</a></li> 

</ul> 
</div> 
</div> 
<script type="text/javascript"> 
var gb_m = new gb_Menu('navi2', 'subnavi2', 250, 15000, 'n_gas', false); 
</script> 


<div class="clearfix"> 
<div class="main_col"> 
<div class="main_boxGB"> 


<div id="div728"></div> 



<style type="text/css"> 
.listing { 
width: 100%; 
border: 1px solid #e2e2e2; 
} 

.listing td { 
font-size: 18px; 
color: #666; 
padding: 5px 5px; 
border-bottom: 1px solid #e2e2e2; 
} 

.listing a { 
color: #33528A; 
text-decoration: none; 
} 

.listing a:hover { 
text-decoration: underline; 
} 

.listing thead tr { 
background: #f2f2f2; 
} 

.listing tbody td:first-child { 
width: 500px; 
text-align: left; 
} 

.listing tbody tr:last-child td { 
border: 0; 
} 

.listing .p { 
text-align: right; 
padding: 0 30px 0 0; 
} 

.listing .up { 
color: #D5111B; 
} 

.listing .down { 
color: #339900; 
} 

.listing .gpd { 
padding-left: 30px; 
background: transparent url(/images/art/gpd_logo_sm.png) no-repeat 5px 50%; 
} 

.listing_nav { 
margin: 0 0 10px; 
padding: 0; 
overflow: hidden; 
width: 800px; 
} 

.listing_nav li { 
float: left; 
list-style: none none outside; 
width: 25%; 
} 

.listing_nav a { 
text-align: center; 
vertical-align: middle; 
padding: 20px 0; 
font-size: 18px; 
text-decoration: none; 
border: 1px solid #e2e2e2; 
border-right: 0; 
display: block; 

} 

.listing_nav a:hover { 
background: #f2f2f2; 
text-decoration: underline; 
} 

.listing_nav li:last-child a { 
border-right: 1px solid #e2e2e2; 
} 

</style> 


<ul class="listing_nav"> 
<li> 
<a href="/GB_Price_List.aspx?cntry=USA">US States</a> 

</li> 
<li> 
<a href="/GB_Price_List.aspx?cntry=USA#us_cities">US Cities</a> 

</li> 
<li> 
<a href="/GB_Price_List.aspx?cntry=CAN">Canadian Provinces</a> 


</li> 
<li> 
<a href="/GB_Price_List.aspx?cntry=CAN#can_cities">Canadian Cities</a> 

</li> 
</ul> 




<div id="ctl00_Content_GBFPL_pnlCanada"> 

<a name="can"></a> 
<div style="margin: 10px 0;"> 

<table class="listing" cellpadding="0" cellspacing="0"> 
<thead> 
<tr> 
<td colspan="4"> 
Average Regular Gas Price By Canadian Province 
</td> 
</tr> 
</thead> 
<tbody> 

<tr> 
<td> 
<a href="http://www.Albertagasprices.com" target="_blank"> 

Alberta 
</a> 
</td> 
<td class="p"> 
117.6 
</td> 
<td class="p down"> 
-0.1 
</td> 
<td> 
<img src="/images/art/sm_trend_flat.gif" alt="" /> 

</td> 
</tr> 

<tr> 
<td> 
<a href="http://www.Manitobagasprices.com" target="_blank"> 

Manitoba 
</a> 
</td> 
<td class="p"> 
123.5 
</td> 
<td class="p down"> 
-0.4 
</td> 
<td> 
<img src="/images/art/sm_trend_down.gif" alt="" /> 

</td> 
</tr> 

<tr> 
<td> 
<a href="http://www.Saskgasprices.com" target="_blank"> 

Saskatchewan 
</a> 
</td> 
<td class="p"> 
126.3 
</td> 
<td class="p"> 
0.0 
</td> 
<td> 
<img src="/images/art/sm_trend_flat.gif" alt="" /> 

</td> 
</tr> 

<tr> 
<td> 
<a href="http://www.NewBrunswickgasprices.com" target="_blank"> 

New Brunswick 
</a> 
</td> 
<td class="p"> 
130.9 
</td> 
<td class="p down"> 
-0.1 
</td> 
<td> 
<img src="/images/art/sm_trend_flat.gif" alt="" /> 

</td> 
</tr> 

<tr> 
<td> 
<a href="http://www.Ontariogasprices.com" target="_blank"> 

Ontario 
</a> 
</td> 
<td class="p"> 
132.6 
</td> 
<td class="p up"> 
+0.3 
</td> 
<td> 
<img src="/images/art/sm_trend_flat.gif" alt="" /> 

</td> 
</tr> 

<tr> 
<td> 
<a href="http://www.PEIgasprices.com" target="_blank"> 

等等......整个页面中只有大约五分之一在字符串中......

现在,无论如何要将GasBuddy上此页面的所有内容合并为一个字符串吗?

1 个答案:

答案 0 :(得分:0)

设置useragent

userAgent("Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36")