尝试从本网站提取所有HTML代码时http://www.gasbuddy.com/GB_Price_List.aspx 我遇到的问题是只有一半的页面被放入我的String中。
我尝试过在SO上找到的几种方法,以及Google搜索的其他来源,但没有一种能够解决我的问题。
这是我检索页面的代码:
private class InternetGasBuddyConnection extends AsyncTask<String, String, String> {
protected String doInBackground(String... urls) {
StringBuilder response = new StringBuilder(30000);
DefaultHttpClient client = new DefaultHttpClient();
HttpGet httpGet = new HttpGet(URL);
String result = "";
try {
HttpResponse execute = client.execute(httpGet);
InputStream content = execute.getEntity().getContent();
BufferedReader buffer = new BufferedReader(new InputStreamReader(content));
String s = "";
while ((s = buffer.readLine()) != null)
response.append(s);
Log.d("before changing and parsing", response.toString());
Document doc = Jsoup.parse(response.toString(), URL);
result = response.toString();
Log.d("no parsing", result.toString());
result = doc.toString();
Log.d("after parsing", result);
} catch (Exception e) {
Log.e("Darrell", result, e);
e.printStackTrace();
}
return result;
}
@Override
public void onPostExecute(final String result) {
Log.d("onPostExcecute()", result);
htmlDoc = result;
}
}
当我调用Log.d("after parsing", result);
代码时,它会在我的logcat中显示:
07-23 13:19:57.833: D/after parsing(32136): <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
07-23 13:19:57.833: D/after parsing(32136): <html xmlns="http://www.w3.org/1999/xhtml">
07-23 13:19:57.833: D/after parsing(32136): <head>
07-23 13:19:57.833: D/after parsing(32136): <title>USA and Canada Current Average Gas Prices By City/State/Province - GasBuddy.com</title>
07-23 13:19:57.833: D/after parsing(32136): <base id="ctl00_head_base" href="http://www.gasbuddy.com/" />
07-23 13:19:57.833: D/after parsing(32136): <script type="text/javascript" src="/js/menu_v3.js?q=11"></script>
07-23 13:19:57.833: D/after parsing(32136): <link href="/Style.css" rel="Stylesheet" />
07-23 13:19:57.833: D/after parsing(32136): <link href="/css/main.css?q=13" rel="Stylesheet" />
07-23 13:19:57.833: D/after parsing(32136): <meta http-equiv="pragma" content="no-cache" />
07-23 13:19:57.833: D/after parsing(32136): <link rel="shortcut icon" href="/favicon.ico" />
07-23 13:19:57.833: D/after parsing(32136): <!--[if lt IE 7]> <link href="/css/main_ie6.css?q=11" rel="Stylesheet" /> <![endif]-->
07-23 13:19:57.833: D/after parsing(32136): <!-- PUT THIS TAG IN THE head SECTION -->
07-23 13:19:57.833: D/after parsing(32136): <script type="text/javascript" src="http://partner.googleadservices.com/gampad/google_service.js"></script>
07-23 13:19:57.833: D/after parsing(32136): <script type="text/javascript"> GS_googleAddAdSenseService("ca-pub-9634286501775085"); GS_googleEnableAllServices();</script>
07-23 13:19:57.833: D/after parsing(32136): <script type="text/javascript"> var site; var siteleft; var siteright; var site_length; site="GasBuddy".toLowerCase(); site_length=site.length; siteleft=site.substring(0,4); siteright=site.substring(site_length-4,site_length); site = siteleft + siteright; GA_googleAddAttr("GasPri_URL", site); </script>
07-23 13:19:57.833: D/after parsing(32136): <script type="text/javascript"> GA_googleAddSlot("ca-pub-9634286501775085", "GasBuddy_Content_Top_728x90"); GA_googleAddSlot("ca-pub-9634286501775085", "GasBuddy_Content_Top_160x600"); GA_googleAddSlot("ca-pub-9634286501775085", "GasBuddy_Content_160x600_Bottom");</script>
07-23 13:19:57.833: D/after parsing(32136): <script type="text/javascript"> GA_googleFetchAds();</script>
07-23 13:19:57.833: D/after parsing(32136): <!-- END OF TAG FOR head SECTION -->
07-23 13:19:57.833: D/after parsing(32136): <script type="text/javascript"> function getElementPosition(offsetTrail){ var offsetLeft = 0; var offsetTop = 0; while (offsetTrail){ offsetLeft += offsetTrail.offsetLeft; offsetTop += offsetTrail.offsetTop; offsetTrail = offsetTrail.offsetParent; } if (navigator.userAgent.indexOf('Mac') != -1 && typeof document.body.leftMargin != 'undefined'){ offsetLeft += document.body.leftMargin; offsetTop += document.body.topMargin; } return {left:offsetLeft,top:offsetTop};}var ad_containers = [['divSkyscraper', 'divSky'], ['divLeaderboard','div728']];function moveAd() { var i = 0; for (i=0; i<ad_containers.length; i++){ if (document.getElementById(ad_containers[i][0])){ document.getElementById(ad_containers[i][0]).style.display = 'block'; document.getElementById(ad_containers[i][0]).style.position='absolute'; document.getElementById(ad_containers[i][0]).style.top=getElementPosition(document.getElementById(ad_containers[i][1])).top+"px"; document.getElementById(ad_containers[i][0]).style.left=getElementPosition(document.getElementById(ad_containers[i][1])).left+"px"; } } } </script>
07-23 13:19:57.833: D/after parsing(32136): <script type="text/javascript">window._addWindowOnResize = function (func){if (typeof window.onresize == 'function'){var oldFunc = window.onresize;window.onresize = function() { oldFunc(); func(); }}else{window.onresize = func;}}</script>
07-23 13:19:57.833: D/after parsing(32136): </head>
07-23 13:19:57.833: D/after parsing(32136): <body>
07-23 13:19:57.833: D/after parsing(32136): <input id="adcoord" type="hidden" value="" />
07-23 13:19:57.833: D/after parsing(32136): <input name="ctl00$serveradcoord" type="hidden" id="ctl00_serveradcoord" />
07-23 13:19:57.833: D/after parsing(32136): <script type="text/javascript"> document.getElementById('adcoord').value= document.getElementById('ctl00_serveradcoord').value; </script>
07-23 13:19:57.833: D/after parsing(32136): <form name="aspnetForm" method="post" action="GB_Price_List.aspx" id="aspnetForm">
07-23 13:19:57.833: D/after parsing(32136): <input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwUJODA2MjYxNTkwD2QWAmYPZBYEAgIPFgIeBGhyZWYFGGh0dHA6Ly93d3cuZ2FzYnVkZHkuY29tL2QCBQ9kFgRmDxYCHgRUZXh0BXk8aW5wdXQgaWQ9ImFkY29vcmQiIHR5cGU9ImhpZGRlbiIgdmFsdWU9ImxhdD00NC45NzIwODQ5MTYxMDUmYW1wO2xuZz0tOTMuMjU1Mzg2MzUyNTM5JmFtcDtydD1zY3JpcHQmYW1wO2NiPTQzMDA0MDc2OD4iIC8+ZAIJDxYCHwFlZGQFCm9rVVAnH00/+plv75hl5Bjosg==" />
07-23 13:19:57.833: D/after parsing(32136): <input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEWAgLon7DQAgL9t/yMB9Wx1nzDGAm5Ha
正如您所看到的,该行不是html结束标记</html>
为什么它不是全部放在字符串中?
HTML页面的其余部分如下(其余部分,我已经达到了身体限制......):
</div>
<div>
<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEWAgK6gqnzAwL9t/yMBxUlbb+cRAiGT6aQPAFgtAU0IQZv" />
</div>
<input id="adcoord" type="hidden" value="lat=44.972084916105&lng=-93.255386352539&rt=script&cb=257130208>" />
<div id="main_wrapper" >
<style type="text/css">
a.social {
background-image: url(/images/art/social_small_sp.png);
background-repeat: no-repeat;
padding: 3px 0px 3px 25px;
margin: 0px 10px 0px 0px;
text-decoration: underline;
}
a.social:hover {
text-decoration: underline;
}
a.fb { background-position: 0px 2px;}
a.tw { background-position: 0px -35px;}
</style>
<div style="font-size: 12px; height: 21px;">
<div style="float: left; padding-top: 3px;">
<a id="ctl00_GBTP_HyperLink1" href="Registration.aspx">[Become A Member]</a>
<a id="ctl00_GBTP_HyperLink2" href="GB_Mem_log_in.aspx">[Log In]</a>
</div>
<div style="float: right; padding-top: 3px;">
<b>Follow Us</b>
<a href="http://www.facebook.com/gasbuddy" target="_blank" class="social fb">Facebook</a>
<a href="http://twitter.com/gasbuddy" target="_blank" class="social tw">Twitter</a>
</div>
</div>
<style type="text/css">
td.gb_h_search {
width: 240px;
vertical-align: bottom;
padding-bottom: 5px;
font-size: 0px;
}
td.gb_h_search span {
font-weight: bold;
color: #555555;
font-size: 17px;
}
td.gb_h_search div {
margin-top: 2px;
}
td.gb_h_class {
vertical-align: bottom;
padding-bottom: 5px;
}
td.gb_h_class a {
font-weight: bold;
padding-left: 20px;
}
</style>
<div id="header" onkeydown="return txtSearch_click(event);">
<table cellspacing="0" cellpadding="0" border="0" style="width: 968px">
<tr>
<td valign="top" width="425px" style="">
<a href="http://www.GasBuddy.com/"><img id="imgHeadbar" alt="" src="../images/logos/gasbuddy_logo.gif" width="425" height="58" /></a>
</td>
<td class="gb_h_search">
</td>
<td class="gb_h_class">
<div>
</div>
</td>
</tr>
</table>
</div>
<style type="text/css">
#s_n_home a {
background-image: url(/images/menu/tp_sp.png);
background-position: 1px 1px;
background-repeat: no-repeat;
vertical-align: bottom;
padding-left: 24px;
}
#subnavi2 .s_n_feat {
padding: 0 3px;
}
#s_n_home li.s_n_feat, #s_n_home li.s_n_feat:hover {background-position: 0px 0px;}
#s_n_home a.s_n_feat_map {background-position: 4px 5px;}
#s_n_home a.s_n_feat_tc {background-position: 4px -27px}
#s_n_home a.s_n_feat_log {background-position: 3px -62px;}
#s_n_home a.s_n_feat_chart {background-position: 4px -133px;}
#s_n_home a.s_n_feat_prize {background-position: 4px -98px;}
#s_n_home a.s_n_feat_tip {background-position: 3px -164px;}
#s_n_home a.s_n_feat_blog {background-position: 3px -194px;}
</style>
<div id="navi2">
<ul>
<li id="n_home"><a href="http://www.gasbuddy.com/">Home</a><span></span></li>
<li id="n_blog"><a href="http://blog.gasbuddy.com/" target="_blank">Blog</a><span></span></li>
<li id="n_gas" class="n_sel"><a href="/GB_Price_List.aspx">Gas Prices</a><span></span></li>
<li id="n_charts"><a href="/gb_retail_price_chart.aspx?time=24">Price Charts</a><span></span></li>
<li id="n_maps"><a href="/gb_gastemperaturemap.aspx">Gas Price Maps</a><span></span></li>
<li id="n_points"><a href="/GB_Contest_Info.aspx?cntry=GB">Points & Prizes</a><span></span></li>
<li id="n_wireless"><a href="/GasBuddyMobileApps.aspx">Mobile Apps</a><span></span></li>
<li id="n_media"><a href="http://media.gasbuddy.com/">Media</a><span></span></li>
<li id="n_help"><a href="/gb_contact.aspx">Contact</a><span></span></li>
<li id="n_advertise"><a href="/GB_AdvertiseWithUs.aspx">Advertise with us</a></li>
</ul>
</div>
<div id="subnavi2">
<div id="s_n_home">
<ul>
<li class="s_n_feat">Top Features:</li>
<li><a href="/gb_gastemperaturemap.aspx" class="s_n_feat_map">Gas Price Heat Map</a><span></span></li>
<li><a href="/Trip_Calculator.aspx" class="s_n_feat_tc">Trip Cost Calculator</a><span></span></li>
<li><a href="/gb_retail_price_chart.aspx?time=24" class="s_n_feat_chart">Gas Price Charts</a><span></span></li>
<li><a href="http://blog.gasbuddy.com/" target="_blank" class="s_n_feat_blog">GasBuddy Blog</a><span></span></li>
<li><a href="GB_Contest_Info.aspx" class="s_n_feat_prize">Win Prizes</a><span></span></li>
<li><a href="/GB_Fuel_Save.aspx" class="s_n_feat_tip">Fuel Saving Tips</a></li>
</ul>
</div>
<div id="s_n_gas" class="s_n_on">
<ul>
<li><a href="/Trip_Calculator.aspx">Trip Cost Calculator</a><span></span></li>
<li><a href="/GB_StateList.aspx">Gas Prices by State/Province</a><span></span></li>
<li><a href="/GB_Price_List.aspx">City & State Averages</a><span></span></li>
<li><a href="/GB_Fuel_Save.aspx">Fuel Saving Tips</a></li>
</ul>
</div>
<div id="s_n_charts">
<ul>
<li><a href="/gb_retail_price_chart.aspx?time=1">Past Month</a><span></span></li>
<li><a href="/gb_retail_price_chart.aspx?time=12">Past Year</a><span></span></li>
<li><a href="/gb_retail_price_chart.aspx?time=24">Past Two Years</a></li>
</ul>
</div>
<div id="s_n_maps">
<ul>
<li><a href="/GB_Map_Gas_Prices.aspx">Map Gas Prices</a><span></span></li>
<li><a href="/gb_gastemperaturemap.aspx">Gas Price Heat Maps</a></li>
</ul>
</div>
<div id="s_n_points">
<ul>
<li><a href="/GB_Contest_Info.aspx?cntry=GB">Prize Give-away</a><span></span></li>
<li><a href="/GB_Contest_Winners.aspx">Recent Winners</a><span></span></li>
<li><a href="/GB_Choose_Site.aspx">Get Entries</a></li>
</ul>
</div>
<div id="s_n_wireless">
<ul>
<li><a href="/GasBuddyiPhoneApp.aspx">iPhone</a><span></span></li>
<li><a href="/GasBuddyAndroidApp.aspx">Android</a><span></span></li>
<li><a href="/GasBuddyWindowsPhoneApp.aspx">Windows Phone</a><span></span></li>
<li><a href="/GasBuddyMobileApps.aspx#MobileWeb">Mobile Web</a><span></span></li>
<li><a href="/GasBuddyBlackBerryApp.aspx">BlackBerry</a></li>
</ul>
</div>
<div id="s_n_media">
<ul>
<li><a href="http://media.gasbuddy.com/">Media Story Ideas</a></li>
</ul>
</div>
<div id="s_n_help">
<ul>
<li><a href="/gb_contact.aspx">Contact Us</a><span></span></li>
<li><a href="http://media.gasbuddy.com/#ContactUs">Media Inquiries</a><span></span></li>
<li><a href="/gb_aboutus.aspx">About Us</a></li>
</ul>
</div>
<div id="s_n_blog">
<ul>
<li><a href="http://blog.gasbuddy.com/" target="_blank">Recent Blog Posts</a></li>
</ul>
</div>
<div id="s_n_advertise">
<ul>
<li><a href="/GB_AdvertiseWithUs.aspx">Advertise with us</a></li>
</ul>
</div>
<div id="s_n_fuel">
<ul>
<li><a href="/Pricelock.aspx">Control your business' fuel costs</a><span></span></li>
<li><a href="/PricelockHowItWorks.aspx">Get paid when fuel prices increase</a></li>
</ul>
</div>
</div>
<script type="text/javascript">
var gb_m = new gb_Menu('navi2', 'subnavi2', 250, 15000, 'n_gas', false);
</script>
<div class="clearfix">
<div class="main_col">
<div class="main_boxGB">
<div id="div728"></div>
<style type="text/css">
.listing {
width: 100%;
border: 1px solid #e2e2e2;
}
.listing td {
font-size: 18px;
color: #666;
padding: 5px 5px;
border-bottom: 1px solid #e2e2e2;
}
.listing a {
color: #33528A;
text-decoration: none;
}
.listing a:hover {
text-decoration: underline;
}
.listing thead tr {
background: #f2f2f2;
}
.listing tbody td:first-child {
width: 500px;
text-align: left;
}
.listing tbody tr:last-child td {
border: 0;
}
.listing .p {
text-align: right;
padding: 0 30px 0 0;
}
.listing .up {
color: #D5111B;
}
.listing .down {
color: #339900;
}
.listing .gpd {
padding-left: 30px;
background: transparent url(/images/art/gpd_logo_sm.png) no-repeat 5px 50%;
}
.listing_nav {
margin: 0 0 10px;
padding: 0;
overflow: hidden;
width: 800px;
}
.listing_nav li {
float: left;
list-style: none none outside;
width: 25%;
}
.listing_nav a {
text-align: center;
vertical-align: middle;
padding: 20px 0;
font-size: 18px;
text-decoration: none;
border: 1px solid #e2e2e2;
border-right: 0;
display: block;
}
.listing_nav a:hover {
background: #f2f2f2;
text-decoration: underline;
}
.listing_nav li:last-child a {
border-right: 1px solid #e2e2e2;
}
</style>
<ul class="listing_nav">
<li>
<a href="/GB_Price_List.aspx?cntry=USA">US States</a>
</li>
<li>
<a href="/GB_Price_List.aspx?cntry=USA#us_cities">US Cities</a>
</li>
<li>
<a href="/GB_Price_List.aspx?cntry=CAN">Canadian Provinces</a>
</li>
<li>
<a href="/GB_Price_List.aspx?cntry=CAN#can_cities">Canadian Cities</a>
</li>
</ul>
<div id="ctl00_Content_GBFPL_pnlCanada">
<a name="can"></a>
<div style="margin: 10px 0;">
<table class="listing" cellpadding="0" cellspacing="0">
<thead>
<tr>
<td colspan="4">
Average Regular Gas Price By Canadian Province
</td>
</tr>
</thead>
<tbody>
<tr>
<td>
<a href="http://www.Albertagasprices.com" target="_blank">
Alberta
</a>
</td>
<td class="p">
117.6
</td>
<td class="p down">
-0.1
</td>
<td>
<img src="/images/art/sm_trend_flat.gif" alt="" />
</td>
</tr>
<tr>
<td>
<a href="http://www.Manitobagasprices.com" target="_blank">
Manitoba
</a>
</td>
<td class="p">
123.5
</td>
<td class="p down">
-0.4
</td>
<td>
<img src="/images/art/sm_trend_down.gif" alt="" />
</td>
</tr>
<tr>
<td>
<a href="http://www.Saskgasprices.com" target="_blank">
Saskatchewan
</a>
</td>
<td class="p">
126.3
</td>
<td class="p">
0.0
</td>
<td>
<img src="/images/art/sm_trend_flat.gif" alt="" />
</td>
</tr>
<tr>
<td>
<a href="http://www.NewBrunswickgasprices.com" target="_blank">
New Brunswick
</a>
</td>
<td class="p">
130.9
</td>
<td class="p down">
-0.1
</td>
<td>
<img src="/images/art/sm_trend_flat.gif" alt="" />
</td>
</tr>
<tr>
<td>
<a href="http://www.Ontariogasprices.com" target="_blank">
Ontario
</a>
</td>
<td class="p">
132.6
</td>
<td class="p up">
+0.3
</td>
<td>
<img src="/images/art/sm_trend_flat.gif" alt="" />
</td>
</tr>
<tr>
<td>
<a href="http://www.PEIgasprices.com" target="_blank">
等等......整个页面中只有大约五分之一在字符串中......
现在,无论如何要将GasBuddy上此页面的所有内容合并为一个字符串吗?
答案 0 :(得分:0)
设置useragent
userAgent("Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36")