Jsoup希望获得所有元素的类名相同的Values

时间:2014-03-24 12:17:58

标签: jsoup

这是我的HTML。从此我想得到两个细节

出版商:Springer-Verlag, 价格:7,284美元

问题是所有外部和内部类名称都相同。请建议如何使用jsoup从HTML下面获得以上两个值。

<div class="details">
    <div class="fullname">ANALYTICAL AND BIOANALYTICAL CHEMISTRY (2011)</div>
    <div class="catbox">
        <div class="catcontents">
            <div class="contents_ct1">Eigenfactor Category:</div>
            <div class="contents_ct2" style="margin-left: -5px;">ANALYTIC CHEMISTRY</div>
        </div>
        <div class="catcontents">
            <div class="contents_ct1">ISI Category:</div>
            <div class="contents_ct2" style="margin-left: -49px;">CO EA</div>
        </div>
        <div class="catcontents">
            <div class="contents_ct1">Group:</div>
            <div class="contents_ct2" style="margin-left: -80px;">Sci</div>
        </div>
        <div class="catcontents">
            <div class="contents_ct1">Total Articles (5yrs):</div>
            <div class="contents_ct2" style="margin-left: -12px;">3,544</div>
        </div>
    </div>
    <div class="catbox" style="margin-left: 20px">
        <div class="catcontents">
            <div class="contents_ct1">Publisher:</div>
            <div class="contents_ct2" style="margin-left: -55px;">Springer-Verlag</div>
        </div>
        <div class="catcontents">
            <div class="contents_ct1">First Published:</div>
            <div class="contents_ct2" style="margin-left: -35px;">2001</div>
        </div>
        <div class="catcontents">
            <div class="contents_ct1"><a href="http://journalprices.com/" title="Prices provided by JournalPrices.com" target="_blank" style="font-size: 11px">Price:</a></div>
            <div class="contents_ct2" style="margin-left: -80px;">$7,284</div>
        </div>
        <div class="catcontents">
            <div class="contents_ct1">Cost Effectiveness:</div>
            <div class="contents_ct2" style="margin-left: -18px;">1.0302</div>
        </div>
    </div>
    <div class="tgraph">
        <div class="plotB">
            <iframe src="plot1.php?issn=1618-2642" width="370px" height="220px" frameborder=0 scrolling="no"></iframe>
        </div>
        <div class="plotB" style="margin-left: 10px">
            <iframe src="plot2.php?issn=1618-2642" width="340px" height="220px" frameborder=0 scrolling="no"></iframe>
        </div>
    </div>
</div>

1 个答案:

答案 0 :(得分:1)

静态HTML结构

假设布局始终遵循您提供的源的结构,您可以使用简单的CSS选择器语法来指定要解析的元素。

Element publisher = doc.select("div.catbox:eq(2) div.catcontents div.contents_ct2").first();
Element price = doc.select("div.catbox:eq(2) div.catcontents:eq(2) div.contents_ct2").first();
System.out.println("Publisher: " + publisher.text() + "\nPrice: " + price.text());

会导致打印输出

run:
Publisher: Springer-Verlag
Price: $7,284

动态HTML结构

如果结构不是一直相同,则下面的代码应该产生相同的结果,但检查元素的文本以正确识别它们。

Elements content = doc.select("div.catcontents");
Element publisher = null;
Element price = null;
for (Element element : content) {
    if(element.text().startsWith("Publisher")){
        publisher = element;
    }
    if(element.text().startsWith("Price")){
        price = element;
    }
}
System.out.println(publisher.text() + "\n" + price.text());