如何使用jsoup从html文件中的强标签中检索数据?

时间:2017-09-25 12:02:44

标签: java html jsoup

我有一些像

这样的html数据
<div class="bs-example">
  <div class="panel panel-primary">
    <div class="panel-heading">
      <h3 class="panel-title">ABC</h3>
    </div>
    <div class="panel-body">
      <div class="slimScroller" style="height:280px; position: relative;" data-rail-visible="1" data-always-visible="1">
        <strong>Name:</strong>
        <a href="https://ABC"> </a><br />
        <strong>ID No:</strong> XXXXX<br />
        <strong>Status:</strong> ACTIVE<br />
        <strong>Class:</strong> 5<br />
        <strong>Category:</strong> A<br />
        <strong>Marks:</strong> 500<br />
      </div>
    </div>
  </div>
</div>

我希望输出为(多个学生数据):

 Name: ABC
 ID No.: XXXXX
 Status: Active
 Class: 5
 Category: A
 Marks: 500

如何使用jsoup或任何其他方式获取此数据?请帮忙。

2 个答案:

答案 0 :(得分:0)

您可以使用Element.nextElementSibling()或/和Element.nextSibling()来获取所需的输出。

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class Exam {
  public static void main(String[] args) {
    String html =  "<div class=\"bs-example\">" +
                    "  <div class=\"panel panel-primary\">" +
                    "    <div class=\"panel-heading\">" +
                    "      <h3 class=\"panel-title\">ABC</h3>" +
                    "    </div>" +
                    "    <div class=\"panel-body\">" +
                    "      <div class=\"slimScroller\" style=\"height:280px; position: relative;\" data-rail-visible=\"1\" data-always-visible=\"1\">" +
                    "        <strong>Name:</strong>" +
                    "        <a href=\"https://ABC\"> </a><br />" +
                    "        <strong>ID No:</strong> XXXXX<br />" +
                    "        <strong>Status:</strong> ACTIVE<br />" +
                    "        <strong>Class:</strong> 5<br />" +
                    "        <strong>Category:</strong> A<br />" +
                    "        <strong>Marks:</strong> 500<br />" +
                    "      </div>" +
                    "    </div>" +
                    "  </div>" +
                    "</div>";

    Document doc = Jsoup.parse(html);
    Elements eles = doc.select("div.slimScroller strong");
    for(Element e :eles)
    System.out.println(e.text() +
                       ( e.nextElementSibling().tagName().equals("a")? 
                         e.nextElementSibling().attr("href").replace("https://", ""):
                         e.nextSibling().toString()));
  }
}

答案 1 :(得分:0)

以下代码应根据您的评论提供指定的输出,说明a代码的位置:

private static void printStudentInfo(Document document){
    Elements students = document.select("div.slimScroller strong");

    for(Element student : students){
        System.out.print(student.text());

        System.out.println(student.nextElementSibling().tagName().equals("a") ?
                student.nextElementSibling().text() : student.nextSibling().toString());
    }
}