如何在android中解析HTML整页

时间:2013-06-17 10:20:26

标签: android html-parsing

我通过网络服务调用HTML页面。我需要获取HTML页面的漏洞源代码。 我的问题是,当我将http响应转换为字符串时,我只得到HTML页面的某些部分。如何获得漏洞HTML页面。请帮助我。

//paramString1 = url,paramString = header, paramList = paramiters
public String a(String paramString1, String paramString2, List paramList)
  {
    String str1 = null; 
    HttpPost localHttpPost = new HttpPost(paramString1);
    localHttpPost.addHeader("Accept-Encoding", "gzip");
    InputStream localInputStream = null;
   try
     {

    localHttpPost.setEntity(new UrlEncodedFormEntity(paramList));
    localHttpPost.setHeader("Referer", paramString2);
    HttpResponse localHttpResponse = this.c.execute(localHttpPost);
    int i = localHttpResponse.getStatusLine().getStatusCode();

    localInputStream = localHttpResponse.getEntity().getContent();
    Header localHeader = localHttpResponse.getFirstHeader("Content-Encoding");
    if ((localHeader != null) && (localHeader.getValue().equalsIgnoreCase("gzip")))
    {
         GZIPInputStream localObject = null;
      localObject = new GZIPInputStream(localInputStream);
      Log.d("API", "GZIP Response decoded!");
      BufferedReader localBufferedReader = new BufferedReader(new InputStreamReader((InputStream)localObject, "UTF-8"));
      StringBuilder localStringBuilder = new StringBuilder();
      while(true){
          String str2 = localBufferedReader.readLine();
          if (str2 == null)
            break;
          localHttpResponse.getEntity().consumeContent();
          str1 = localStringBuilder.toString();
          localStringBuilder.append(str2);
          continue;
      }
    }
  }
  catch (IOException localIOException)
  {
    localHttpPost.abort();

  }
  catch (Exception localException)
  {
    localHttpPost.abort();

  }
  Object localObject = localInputStream;

return (String)str1;

2 个答案:

答案 0 :(得分:0)

你是否在变量paramString1中收到了HTML?在这种情况下,你是以某种方式编码String还是只是平面HTML?

也许HTML特殊字符会破坏您的回复。尝试在服务器端使用urlSafe Base64编码String,并在客户端解码:

您可以使用Apache Commons的Base64功能。

服务器端:

Base64 encoder = new Base64(true);
encoder.encode(yourBytes);

客户方:

Base64 decoder = new Base64(true);
byte[] decodedBytes = decoder.decode(paramString1);
HttpPost localHttpPost = new HttpPost(new String(decodedBytes));

答案 1 :(得分:0)

你可能无法在stringBuilder中获得完整的源代码,因为它必须超过stringBuilder的最大大小,因为StringBuilder是数组的集合。如果你想存储那个特定的源代码。您可以尝试这样:inputStream(包含html源代码)数据,直接存储到File中。然后,您将在该文件中拥有完整的源代码,然后执行文件操作以满足您的任何需求。看看这对你有帮助。