如何用Jsoup捕获这个文本?

时间:2013-11-25 22:52:49

标签: java html css parsing jsoup

我只是想从这个源代码中提取一些文本而疯狂:

<tr class="even"> <!-- Title --> <td class="title riot" title="Summoners, We will be performing Live Maintenance on the 26/11 at 04:00 AM, where we will need to bring the EUW Platform offline. Following up...">

我已经尝试了很多构造函数的组合,但是如果没有任何建议我就不能真正做到这一点......我需要在“标题之后”之间捕捉文本......

请注意,有一个类似的类,叫做“odd”,它的语法与第一个相同,就是这样:

<tr class="odd">
<!-- Title -->
<td class="title riot" title="Summoners, welcome to the Service Status forum! Here you can come to see information regarding ongoing issues or events that we are currently working...">

所以,我需要一些能够捕捉到这两个类的文字的东西......

感谢您的帮助。

编辑:这是我的代码,我连接并抓住一些链接:

Document doc = Jsoup.connect("http://forums.euw.leagueoflegends.com/board/forumdisplay.php?f=10")
                                    .userAgent("Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.22 (KHTML, like Gecko) Chrome/25.0.1364.172 Safari/537.22")
                                    .timeout(30000).get();
                    Elements links = doc.select("a[href*=thread]");
                    for (Element link : links){
                        if(link.attr("href").contains("board")||link.attr("href").contains("page")||link.text().matches("1")){}
                        else{
                            titles.add((String) link.text());

                            //descriptions.add((String) DEFAULT_FORUM_URL + link.attr("href"));
                            descriptions.add((String) doc.select("[title*=a]").toString());
                        }
                    }

注释行写在ListView的每个第二行,即线程的链接,但是我需要写那些标签之间的简要描述“td class =”title riot“title =”,来自每个类

当然,这一行

descriptions.add((String) doc.select("[title*=a]").toString());

不起作用。

2 个答案:

答案 0 :(得分:1)

这个怎么样:

Document doc = Jsoup.connect("http://forums.euw.leagueoflegends.com/board/forumdisplay.php?f=10").get();

for (Element element : doc.select("tr.odd > td, tr.even > td")) {
    System.out.println(element.attr("title"));
}

将输出:

Summoners, welcome to the Service Status forum! Here you can come to see information regarding ongoing issues or events that we are currently working...




Summoners, 

We will be performing a maintenance on 26/11 at 04:00 AM, where we will need to bring the EUW Platform offline. 

Following up on the...

答案 1 :(得分:0)

以下是一个示例:

public static final String text = "" +
    "<table><tr class=\"even\"> <!-- Title -->\n" +
    "    <td class=\"title riot\"\n" +
    "        title=\"Summoners, We will be performing Live Maintenance on the 26/11 at 04:00 AM, where we will need to bring the EUW Platform offline. Following up...\">\n" +
    "    </td>\n" +
    "</tr>\n" +
    "<tr class=\"odd\">\n" +
    "    <!-- Title -->\n" +
    "    <td class=\"title riot\"\n" +
    "        title=\"Summoners, welcome to the Service Status forum! Here you can come to see information regarding ongoing issues or events that we are currently working...\">\n" +
    "    </td>\n" +
    "</tr></table>";

public static void main(String[] args) throws IOException {
    Document doc = Jsoup.parse(text);

    //System.out.println("your doc:" + doc);

    for (Element element : doc.select("tr > td")) {
        System.out.println(element.attr("title"));
    }
}

打印:

Summoners, We will be performing Live Maintenance on the 26/11 at 04:00 AM, where we will need to bring the EUW Platform offline. Following up...
Summoners, welcome to the Service Status forum! Here you can come to see information regarding ongoing issues or events that we are currently working...