Question

我在Android上正在做一个应用程序。

我在一个字符串中有一个web（所有HTML）的内容，我需要提取段落（p元素）中的所有文本都带有class =“content”。

示例：

<p class="content">La la la</p>
<p class="another">Le le le</p>
<p class="content">Li li li</p>

结果：

La la la
Li li li

这样做的最佳方法是什么？

Answer 1

import java.io.DataInputStream;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;


public class Test {
    void readScreen () //reads from server
      {
        try
        {
          URL                url;
          URLConnection      urlConn;
          DataInputStream    dis;

          //Open url
          url = new URL("http://somewebsite.com");

          // Note:  a more portable URL:
          //url = new URL(getCodeBase().toString() + "/ToDoList/ToDoList.txt");

          urlConn = url.openConnection();
          urlConn.setDoInput(true);
          urlConn.setUseCaches(false);

          dis = new DataInputStream(urlConn.getInputStream());
          String s;

          while ((s = dis.readLine()) != null)
          {
            System.out.println(s); //this is where it reads from the screen
          }
            dis.close();
          }

          catch (MalformedURLException mue) {}
          catch (IOException ioe) {}
        }

    public static void main(String[] args){

        Test thisTest = new Test();
        thisTest.readScreen();

    }
}

Answer 2

正则表达式是你最好的选择。

http://download-llnw.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html

提取网页的一部分

2 个答案: