Question

<div></div>
    <div></div>
           <div></div>
            <div></div>
                <ul>
        <form id=the_main_form method="post">
                   <li>
                       <div></div>
                       <div> <h2> 
                <a onclick="xyz;" target="_blank" href="http://sample.com" style="text-decoration:underline;">This is sample</a>
                 </h2></div>
                       <div></div>
                        <div></div>
                      </li>

有50个像这样的

我已经从一个大HTML发布了html片段。

<div> </div> =＆gt;意味着它们之间有数据删除了数据，因为它不是必需的。

我想知道JSOUP- select语句是如何提取href和Text的？

我选择了doc.select（“div div div ul xxxx”）;

其中xxx是形式..shoud我给表单id（或）我该怎么做

Answer 1

试试这个：

Elements allLis = doc.select("#the_main_form > li ");
for (Element li : allLis) {
    Element a = li.select("div:eq(1) > h2 > a");
    String href = a.attr("href");
    String text = a.text();
}

希望它有所帮助！

修改

Elements allLis = doc.select("#the_main_form > li ");

此部分代码会获取位于<li>内<form>的所有#the_main_form个标记。

Element a = li.select("div:eq(1) > h2 > a");

然后，我们迭代所有<li>代码并获取<a>代码，方法是先使用index = 1获取<div>个代码（所有<li>内的第二个代码 - ＆gt; div:eq(1)）然后获取<h2>标记，其中包含我们的<a>标记。

希望你现在明白！：）

Answer 2

请试试这个：

package com.stackoverflow.works;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;

/*
 * @ author: sarath_sivan
 */

public class HtmlParserService {

    public static void parseHtml(String html) {
        Document document = Jsoup.parse(html);
        Element linkElement = document.select("a").first();
        String linkHref = linkElement.attr("href"); // "http://sample.com"
        String linkText = linkElement.text(); // "This is sample"
        System.out.println(linkHref);
        System.out.println(linkText);
    }

    public static void main(String[] args) {
        String html = "<a onclick=\"xyz;\" target=\"_blank\" href=\"http://sample.com\" style=\"text-decoration:underline;\">This is sample</a>";
        parseHtml(html);
    }

}

希望你的类路径中有Jsoup库。

谢谢！

使用java从表单获取数据

2 个答案: