Question

我试图在Android应用中解析一些HTML，我需要获取文字：

Pan Artesano Elaborado por Panadería La Constancia. ¡Esta Buenísimo!

in

有没有简单的方法来获取文本并删除所有html标签？

我需要的行为正是这个PHP代码http://php.net/manual/es/function.strip-tags.php

中显示的行为

Answer 1

Document doc = Jsoup.parse(html);
Element content = doc.getElementById("someid");
Elements p= content.getElementsByTag("p");

String pConcatenated="";
for (Element x: p) {
  pConcatenated+= x.text();
}

System.out.println(pConcatenated);//sometext another p tag

Answer 2

当你想要展示它时，webview会帮助你，只需将该字符串设置为webview即可。

当你想在其他地方使用它时，我就是愚蠢的：D。

 String data = "your html here";
        WebView webview= (WebView)this.findViewById(R.id.webview);
        webview.getSettings().setJavaScriptEnabled(true);
        webview.loadDataWithBaseURL("", data, "text/html", "UTF-8", "");

您也可以只传递网址webview.loadDataWithBaseURL("url","","text/html", "UTF-8", "");

Answer 3

首先使用

获取HTML代码

HttpClient client = new DefaultHttpClient();
HttpGet request = new HttpGet(url);
HttpResponse response = client.execute(request);

String html = "";
InputStream in = response.getEntity().getContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
StringBuilder str = new StringBuilder();
String line = null;
while((line = reader.readLine()) != null)
{
    str.append(line);
}
in.close();
html = str.toString();

然后我建议在HTML中创建自定义标记，例如<toAndroid></toAndroid>，然后您可以使用

获取文本

String result = html.substring(html.indexOf("<toAndroid>", html.indexOf("</toAndroid>")));

你的html例如

<toAndroid>Hello world!</toAndroid>

将导致

Hello world！

请注意，您可以将<p>放入<toAndroid>标记，然后将其从结果中删除。

在Android中解析HTML文本

3 个答案: