使用Jsoup Library解析具有键值对的字符串

时间:2014-01-07 13:46:43

标签: java jsoup

我一直坚持如何以键值对的形式解析这些数据。请指导我

<div class="content">
    <div class="label">Company Name: </div>
    Cartell Chemical Co., Ltd.
    <br/>
    <div class="label">Business Owner: </div>
    Michael Chen
    <br/>
    <div class="label">Employees: </div>
    210
    <br/>
    <div class="label">Main markets: </div>
    North America, Europe, China, South Asia
    <br/>
    <div class="label">Business Type: </div>
    Manufacturer
    <br/>
</div>

我需要以这些格式输出。请指导我使用Java和Jsoup库

Company Name:Cartell Chemical Co., Ltd.
Business Owner:Michael Chen
Employees:210
Main markets:North America, Europe, China, South Asia
Business Type:Manufacturer

2 个答案:

答案 0 :(得分:4)

查看文档。

这是一个有效的例子:

public class StackOverflow20973268 {
    private static String input = "<div class=\"content\">" +
            "<div class=\"label\">Company Name: </div>" +
            "Cartell Chemical Co., Ltd." +
            "<br/>" +
            "<div class=\"label\">Business Owner: </div>" +
            "Michael Chen" +
            "<br/>" +
            "<div class=\"label\">Employees: </div>" +
            "210" +
            "<br/>" +
            "<div class=\"label\">Main markets: </div>" +
            "North America, Europe, China, South Asia" +
            "<br/>" +
            "<div class=\"label\">Business Type: </div>" +
            "Manufacturer" +
            "<br/>" +
            "</div>";

    public static void main(String[] args) throws IOException {
        Document doc = Jsoup.parse(input);
        Elements labels = doc.select("div.content div.label");
        for (Element label : labels) {
            System.out.println(String.format("%s:%s", label.text().trim(),
                    label.nextSibling().outerHtml()));
        }
    }
}

输出:

Company Name::Cartell Chemical Co., Ltd.
Business Owner::Michael Chen
Employees::210
Main markets::North America, Europe, China, South Asia
Business Type::Manufacturer

答案 1 :(得分:-1)

Jsoup library 非常适合解析html。它允许按类/ id名称或树dom遍历提取值。你基本上得到一个div元素,并找到它的子节点,它们可以是文本节点(包含要解析的文本)或另一个具有自己子节点的元素。 例如,您可以执行类似的操作(未使用某些伪测试)

    doc = Jsoup.parse(info);
        Elements divs= doc.body().getElementsByTag("div");
    for (Element divElement: divs) {
        //extract text of div element with div.textNodes()
        //then 
        //div.nextNode() or something like that 
    }

基本上找到元素并踩到文本或下一个/上一个元素。