从源代码中剥离html标记

时间:2013-03-19 14:55:34

标签: java android html regex

HTML = EntityUtils.toString(response.getEntity());
ResponseHandler<String> responseHandler = new BasicResponseHandler();
String ResponseBody = httpclient.execute(httppost, responseHandler);
table = ResponseBody.substring(ResponseBody.indexOf("<table border=\"1\" cellpadding=\"0\" width=\"100%\" cellspacing=\"0\">"));
table = table.substring(0, table.indexOf("</table>"));  

String htmlString = table;
String noHTMLString = htmlString.replaceAll("\\<.*?\\>", "");
noHTMLString = noHTMLString.replaceAll("\r", "<br/>");
noHTMLString = noHTMLString.replaceAll("\n", " ");
noHTMLString = noHTMLString.replaceAll("\'", "&#39;");
noHTMLString = noHTMLString.replaceAll("\"", "&quot;");

TextView WORK = (TextView) findViewById(R.id.HTML);
WORK.setText(htmlString); 

我正在使用正则表达式来提取HTML代码。这是我的代码。看起来是正确的但是表(substring)是返回的而不是提取的文本。有谁知道为什么???

1 个答案:

答案 0 :(得分:2)

您必须使用新的String对象作为TextView的源。改变这个:

WORK.setText(htmlString);

以下内容:

WORK.setText(noHTMLString);