我尝试解析网站上的信息。但是,它仅在上下文不长时才起作用。随着Html变大,加载的内容不完整。检索到的String的总长度约为40000.每次检索的字符串计数不同。 (例如:这是第一次31345和下次31358)所以我无法检索整页。
结果,我认为这个问题可能与互联网连接或缓冲区有关。但我已经使用了bufferedReader,据我所知HttpURLConnection像流一样工作,所以应该没有任何问题。我几乎检查了与UrlConnection相关的所有页面,但没有人谈到这一点。
我的代码有什么问题吗?我已经在这个问题上工作了几天,任何建议都会非常有帮助。提前谢谢。
public String getHtmlFromUrl(String url, int startReadingLine) {
String xml = "";
try {
//URL url1 = new URL(url);
URL url1 = new URL("http://support.google.com/analytics/bin/answer.py?hl=zh-Hant&answer=1009602");
HttpURLConnection urlConn = (HttpURLConnection) url1
.openConnection();
urlConn.setRequestProperty("User-Agent",
"Mozilla/5.0 (Windows NT 6.1;zh-tw; MSIE 6.0)");
if (Integer.parseInt(Build.VERSION.SDK) < Build.VERSION_CODES.FROYO) {
System.setProperty("http.keepAlive", "false");
}
urlConn.setReadTimeout(10000 /* milliseconds */);
urlConn.setConnectTimeout(15000 /* milliseconds */);
urlConn.setDoOutput(true);
urlConn.setDoInput(true);
urlConn.setRequestMethod("GET");
urlConn.setUseCaches(false);
InputStreamReader in = new InputStreamReader(
urlConn.getInputStream());
BufferedReader buffer = new BufferedReader(in, 100000);
StringBuilder builder = new StringBuilder();
String auxaux = "";
while ((aux = buffer.readLine()) != null)
builder.append(aux);
xml = builder.toString();
in.close();
urlConn.disconnect();
} catch (SocketTimeoutException e) {
return "time out";
} catch (IOException e) {
e.printStackTrace();
}
// return XML
return xml;
}
以下是xml的示例:(计数为40710)
(我没有在xml的末尾添加“...”)
<!DOCTYPE html><html lang="zh-Hant"class="streamlined streamlined-3"><head><script type="text/javascript">serverResponseTimeDelta=window.external&&window.external.pageT?window.external.pageT:-1;pageStartTime=new Date().getTime...
...
..."納米比亞", "NR": "諾魯", "NP": "尼泊爾", "NL": "荷蘭", "AN": "荷屬安地列斯", "KN": "尼維斯", "NC": "新喀里多尼亞", "NI": "尼加拉瓜", "NE": "尼日", "NG": "奈及利亞", "NU": "紐埃", "KR": "北韓", "NO": "挪威", "NZ": "紐西蘭", "OM": "阿曼", "PW": "帛琉", "PK": "巴基斯坦", "PS": "巴勒斯坦", "PA": "巴拿馬", "PG": "巴布亞新幾內亞", "PY": "巴拉圭", "PE": "秘魯", "PH"...
另一个:(伯爵41106)
<!DOCTYPE html><html lang="zh-Hant"class="streamlined streamlined-3"><head><script type="text/javascript">serverResponseTimeDelta=window.external&&window.external.pageT?window.externa...
...
...屬安地列斯", "KN": "尼維斯", "NC": "新喀里多尼亞", "NI": "尼加拉瓜", "NE": "尼日", "NG": "奈及利亞", "NU": "紐埃", "KR": "北韓", "NO": "挪威", "NZ": "紐西蘭", "OM": "阿曼", "PW": "帛琉", "PK": "巴基斯坦", "PS": "巴勒斯坦", "PA": "巴拿馬", "PG": "巴布亞新幾內亞", "PY": "巴拉圭", "PE": "秘魯", "PH"...
编辑: 到目前为止,我认为它与它与互联网的交互方式有关,因为每个结果的计数不同,或者它可能是我的设备的一些奇怪的错误。根本原因尚未找到。 最奇怪的部分是它在结果中以“......”结尾。它似乎知道结果尚未完成......
答案 0 :(得分:1)
始终尝试将您的输入写入外部文件并查看您实际收到的内容! 我在Android上也有同样的问题。在结束时,logcat没有向我显示整个String!
答案 1 :(得分:0)
您可以尝试以下代码。
BufferedInputStream bis = new BufferedInputStream(in);
ByteArrayOutputStream buf = new ByteArrayOutputStream();
int result = bis.read();
while(result != -1) {
byte b = (byte)result;
buf.write(b);
result = bis.read();
}
return buf.toString();
否则:
Writer writer = new StringWriter();
char[] buffer = new char[1024];
try {
Reader reader = new BufferedReader(
new InputStreamReader(is, "UTF-8"));
int n;
while ((n = reader.read(buffer)) != -1) {
writer.write(buffer, 0, n);
}
} finally {
is.close();
}
return writer.toString();
我目前使用的最后一种方法是:
URL u=null;
InputStream is = null;
DataInputStream dis;
StringBuffer outData = new StringBuffer();
try {
u = new URL(url);
is = u.openStream();
dis = new DataInputStream(new BufferedInputStream(is));
String app = null;
while ((app = dis.readLine()) != null) {
outData = outData.append(app);
}
} catch (MalformedURLException ex) {
Log.e(TAG, "Malformed URL Exception", ex);
return null;
} catch (IOException ex) {
Log.e(TAG, "Error stream ", ex);
return null;
} finally {
try {
is.close();
} catch (IOException ioe) {
}
}
return outData.toString();