在Android中使用Jsoup选择器组合进行Html解析

时间:2013-07-15 16:13:46

标签: android jsoup

我想解析<dt>Seeders:</dt>&amp;来自使用Jsoup的html的<dt>Leechers:</dt>。 请参阅下面的完整代码。

<div id="details">
    <dl class="col1">
        <dt>Type:</dt>
        <dd><a href="/browse/101" title="More from this category">Audio &gt; Music</a></dd>

        <dt>Files:</dt>
                <dd><a href="/torrent/8682317/" title="Files" onclick="
if (filelist &lt; 1) {
        new Ajax.Updater('filelistContainer', '/ajax_details_filelist.php', {method: 'get', parameters: 'id=8682317'});
        filelist=1;
}; toggleFilelist(); return false;">28</a></dd>

        <dt>Size:</dt>
        <dd>222.65&nbsp;MiB&nbsp;(233468815&nbsp;Bytes)</dd>
        <br />



                    <dt>Tag(s):</dt>
            <dd><a href="/tag/markus">markus</a> <a href="/tag/schulz">schulz</a> <a href="/tag/dakota">dakota</a> <a href="/tag/things">things</a> <a href="/tag/trance">trance</a> <a href="/tag/armada">armada</a> <a href="/tag/2011">2011</a> <a href="/tag/inspiron">inspiron</a> </dd>
                <br />
        <dt>Uploaded:</dt>
        <dd>2013-07-13 15:30:25 GMT</dd>
        <dt>By:</dt>
        <dd>
        <a href="/user/-inspiron-/" title="Browse -inspiron-">-inspiron-</a>&nbsp;<img src="/static/img/vip.gif" alt="VIP" title="VIP" style="width:11px;" border='0' /></dd>
        <br />

        <dt>Seeders:</dt>
        <dd>16</dd>

        <dt>Leechers:</dt>
        <dd>1</dd>

        <dt>Comments</dt>
        <dd><span id="NumComments">0</span>
                &nbsp;
                </dd>

        <br />
        <dt>Info Hash:</dt><dd>&nbsp;</dd>
        01DD6B7325C3DB5F0DF5BBE510FD3FD9738D1C88    </dl>
<div class="torpicture">
<img src="//image.bayimg.com/345b5b11734bb9973863359cc52929f3ddc45205.jpg" title="picture" alt="picture" />
</div>
    <dl class="col2">
    </dl>

    <div id="CommentDiv" style="display:none;">
        <form method="post" id="commentsform" name="commentsform" onsubmit="new Ajax.Updater('NumComments', '/ajax_post_comment.php', {evalScripts:true, asynchronous:true, parameters:Form.serialize(this)}); return false;" action="/ajax_post_comment.php">
            <p class="info">
                <textarea name="add_comment" id="add_comment" rows="8" cols="50"></textarea><br/>
                <input type="hidden" name="id" value="8682317"/>
                <input type="submit" value="Submit" /><input type="button" value="Hide" onclick="document.getElementById('CommentDiv').style.display = 'none'" />
            </p>
        </form>
    </div>
        <br/>
        <br/>
<div id="social">
</div>

         <iframe src="http://cdn1.adexprt.com/dl/dl.php?b=bar&r=75&n=Markus_Schulz_-_Global_DJ_Broadcast_%282013-07-11%29_%28Inspiron%29&m=magnet%3A%3Fxt%3Durn%3Abtih%3A01dd6b7325c3db5f0df5bbe510fd3fd9738d1c88%26dn%3DMarkus%2BSchulz%2B-%2BGlobal%2BDJ%2BBroadcast%2B%25282013-07-11%2529%2B%2528Inspiron%2529%26tr%3Dudp%253A%252F%252Ftracker.openbittorrent.com%253A80%26tr%3Dudp%253A%252F%252Ftracker.publicbt.com%253A80%26tr%3Dudp%253A%252F%252Ftracker.istole.it%253A6969%26tr%3Dudp%253A%252F%252Ftracker.ccc.de%253A80%26tr%3Dudp%253A%252F%252Fopen.demonii.com%253A1337" width="622" height="51" frameborder="0" scrolling="no"></iframe>
    <br /><br />    <div class="download">
            <a style='background-image: url("/static/img/icons/icon-magnet.gif");' href="magnet:?xt=urn:btih:01dd6b7325c3db5f0df5bbe510fd3fd9738d1c88&dn=Markus+Schulz+-+Global+DJ+Broadcast+%282013-07-11%29+%28Inspiron%29&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80&tr=udp%3A%2F%2Ftracker.publicbt.com%3A80&tr=udp%3A%2F%2Ftracker.istole.it%3A6969&tr=udp%3A%2F%2Ftracker.ccc.de%3A80&tr=udp%3A%2F%2Fopen.demonii.com%3A1337" title="Get this torrent">&nbsp;Get this torrent</a> 

                <a style='background-image: url("/static/img/icon-https.gif");' href="http://adexprt.me/get/Markus_Schulz_-_Global_DJ_Broadcast_%282013-07-11%29_%28Inspiron%29?tag=bal" title="Anonymous Download">&nbsp;Anonymous Download</a>
    </div>
        <div>(Problems with magnets links are fixed by upgrading your <a href="http://www.bitlordapp.com/d/btl1/?sr=irm&chnl=details" target="_blank">torrent client</a>!)</div>

    <div class="nfo">
<pre>=======================================================
Site: http://www.inspirontrance.com/
=======================================================


=======================================================
F B Page: Inspiron Trance
=======================================================


=======================================================
TWITTER : inspiron22
======================================================= 


Markus Schulz
01. Mobil - One Morning (Aleksey Sladkov Remix)
02. Store N Forward - Nuts
03. Alter Future vs. Holbrook &#38; SkyKeeper - Megapolis
04. Danilo Ercole - Cruzer
05. Aaron Camz - Emission
06. Markus Schulz Featuring Sarah Howells - Tempted
07. M.I.K.E. Presents Caromax - Inner Thoughts
08. Ruffault - Progressive Dream
09. Styller - What We Left Behind
10. Meridian - Exit
11. Lange - A Different Shade of Crazy
12. Tucandeo Featuring Natalie Gioia - Disappear (Xtigma Remix)
13. Sebastian Weikum - Sky is the Limit
14. Markus Schulz - Don&#39;t Leave Until the Sunrise

Guy J
01. Roger Martinez &#38; Secret Cinema - Menthol Raga (Guy J Remix)
02. Ambassador - The Fade (Guy J Remix)
03. Guy J - Seven
04. Echomen &#9516;&#251; Perpetual (Guy J Remix)

Back with Markus Schulz
15. Mauro Picotto &#38; Riccardo Ferri - New Time, New Place (New World Punx Remix)
16. Grube &#38; Hovsepian - Trickster
17. Nifra - Waves
18. Markus Schulz featuring Dauby - Perfect (Digital X Remix) [Global Selection]
19. Basil O&#39;Glue - Gilgamesh
20. Skytech - The Other Side
21. ID


Enjoy
(Inspiron)      </pre>
    </div>

我使用这段代码来解析整个细节​​,而不是解析'seeders'和&amp; 'leechers'

try {
                document = Jsoup.connect(BLOG_URL).get();
                title = document.title();
            } catch (IOException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
            // selector query
            Elements nodeBlogStats = document.select("div#details");
            // check results
            if (nodeBlogStats.size() > 0) {
                // get value
                result = nodeBlogStats.get(0).text();
            }

1 个答案:

答案 0 :(得分:0)

根据http://jsoup.org/apidocs/org/jsoup/select/Selector.html,您正在寻找

  

E~F 一个前面有兄弟E的F元素

  

:包含包含指定文本的(文本)元素。

我会尝试

Element seeders = document.select("dt:contains(Seeders) ~ dd").get(0);
Element leechers = document.select("dt:contains(Leechers) ~ dd").get(0);