解析微数据时,我得到空值

时间:2015-09-11 08:44:17

标签: java jsoup microdata

 <div class="content-sidebar-wrap"><main class="content" role="main" itemprop="mainContentOfPage" itemscope="itemscope" itemtype="http://schema.org/Blog"><article class="post-2334 post type-post status-publish format-standard has-post-thumbnail category-blog-basics entry" itemscope="itemscope" itemtype="http://schema.org/BlogPosting" itemprop="blogPost"><header class="entry-header"><h1 class="entry-title" itemprop="headline">Examples of Blogs</h1> 
    <p class="entry-meta">by <span class="entry-author" itemprop="author" itemscope="itemscope" itemtype="http://schema.org/Person"><span class="entry-author-name" itemprop="name">Kenneth Byrd</span></span> | Go from 0 to 5,000 blog subscribers in 60 days <a href="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/" rel="nofollow">(Click Here)</a></p></header><img src="http://blogbasics.com/wp-content/uploads/Examples-of-Blogs.jpg" width="5315" height="3543" alt="examples of blogs" title="" class="attachment-tru-post wp-post-image" /><div class="entry-content" itemprop="text"><h3>Overview</h3>
    <p>This article includes examples of blogs from various niches. There are millions of example blogs out there in all different shapes and sizes. A good place to start is <a href="http://technorati.com/" target="_blank">Technorati</a>, a directory of blogs, or <a href="http://alltop.com/" target="_blank">Alltop</a>. Search these websites and then come back and tell us about the good blogs and the bad blogs that you found. Below are also more examples of blogs that you should look at:</p>
    <h3><strong>Personal blogs</strong></h3>
    <p><a title="Curl Centric" href="http://www.curlcentric.com/natural-hair-101/" target="_blank">Curl Centric</a>: Dedicated to providing healthy hair care information.</p>
    <h3>Travel</h3>
    <p><a href="http://boardingarea.com/" target="_blank">Boarding Area</a>:  A collection of bloggers on travel.  Range from personal stories to specific advice on airlines, hotels and places.</p>
    <div class="content-box-yellow"><a href="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/">Learn how to start a blog that generates traffic, revenue, &amp; popularity.</a></div>
    <p><a href="http://vivisrandomramblings.blogspot.com/" target="_blank">Vivi&#8217;s Random Ramblings</a>: A nice collection of random posts mostly demonstrating that Violy is a well-travelled, excellent photographer.</p>
    <!-- Quick Adsense WordPress Plugin: http://quicksense.net/ -->
    <div style="float:none;margin:5px 0 5px 0;text-align:center;">
    <script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
    <!-- Blog Basics - 300 x 250 -->
    <ins class="adsbygoogle"
         style="display:inline-block;width:300px;height:250px"
         data-ad-client="ca-pub-5556427932737077"
         data-ad-slot="6553509385"></ins>
    <script>
    (adsbygoogle = window.adsbygoogle || []).push({});
    </script>
    </div>

我正在尝试使用Jsoup库解析HTML源代码中所有itemprop属性中存在的所有itemtype的值。

以下是HTML页面正文示例:

<body class="single single-post postid-2334 single-format-standard custom-header header-image header-full-width full-width-content" itemscope="itemscope" itemtype="http://schema.org/WebPage"><div class="site-container"><header class="site-header" role="banner" itemscope="itemscope" itemtype="http://schema.org/WPHeader"><div class="wrap"><div class="title-area"><p class="site-title" itemprop="headline"><a href="http://blogbasics.com/">Blog Basics</a></p><div id="title_image"><a href="http://blogbasics.com/" title="Blog Basics"><img src="http://blogbasics.com/wp-content/uploads/cropped-cropped-Win-1.png" title="Blog Basics" /></a><style>#title { display:none; }</style></div><p class="site-description" itemprop="description">Starting a blog? Learn how to make it amazing.</p></div></div></header><nav class="nav-primary" role="navigation" itemscope="itemscope" itemtype="http://schema.org/SiteNavigationElement"><div class="wrap"><ul id="menu-primary-navigation" class="menu genesis-nav-menu menu-primary"><li id="menu-item-2590" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-home menu-item-2590"><a title="Blog Basics" href="http://blogbasics.com">Home</a></li>
<li id="menu-item-3187" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-3187"><a href="http://blogbasics.com/blog">Blog</a></li>
<li id="menu-item-3722" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-3722"><a href="http://blogbasics.com/welcome">Free Updates</a></li>
<li id="menu-item-2578" class="menu-item menu-item-type-post_type menu-item-object-page menu-item-2578"><a title="Blogging Tools" href="http://blogbasics.com/blogging-tools/">Blogging Tools</a></li>
</ul></div></nav><div class="site-inner"><div class="feature-area widget-area">
<div id="spyr_tru_notifybar-2" class="widget notify_bar"><div class="widget-wrap">Starting a blog? Learn how to make it awesome!</div></div>

<div id="spyr_tru_twocolumn-3" class="widget widget_spyr_tru_twocolumn"><div class="widget-wrap">
<div class="column one-half first original"><div align="middle"><a href="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/" target="_blank"><img src="https://lh3.ggpht.com/wMcDasimpny0oKmsIxI0xowRIxbFpaa9Rjg3aAUMG8UUtp4XamG03gYcGlsXTRmvkFqgySaihhq2_KCSr8cN7Q=s0"></a><script data-leadbox="14581e773f72a2:12e927026b46dc" data-url="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/" data-config="%7B%7D" type="text/javascript" src="https://curlcentric.leadpages.net/leadbox-910.js"></script></div>
</div>
<div class="column one-half last original"><p>Learn how to build a blog that generates traffic, revenue, & popularity in 30 days.</p>
<p>Just enter your email address in the box below and click "Submit".</p>
</div>
<div class="clear"></div>
</div></div>
<div id="spyr_tru_subscribesocial-2" class="widget feature-area-bottom tru_subscribe_social"><div class="widget-wrap">
<div class="tru_subscribesocial_wrap">
    <form action="http://www.aweber.com/scripts/addlead.pl" method="post" target="_blank">
        <div class="hidden_fields"><input type="hidden" name="meta_web_form_id" value="276964962" />
<input type="hidden" name="meta_split_id" value="" />
<input type="hidden" name="listname" value="awlist3567293" />
<input type="hidden" name="redirect" value="http://www.aweber.com/thankyou-coi.htm?m=text" id="redirect_f956eccce03104dc62dec5f8c897285e" />

<input type="hidden" name="meta_adtracking" value="Blog_Basics" />
<input type="hidden" name="meta_message" value="1" />
<input type="hidden" name="meta_required" value="email" />

<input type="hidden" name="meta_tooltip" value="" /></div>
        <input type="email" class="default_value" name="email" value="Enter email to get updates" /></span>
        <input type="submit" value="Submit" />
        </form>
    <div class="social_menu">
        <ul id="menu-social" class="menu superfish">

            </ul>
        </div>
    <div class="clear"></div>
    </div>
</div></div>
</div><div class="content-sidebar-wrap"><main class="content" role="main" itemprop="mainContentOfPage" itemscope="itemscope" itemtype="http://schema.org/Blog"><article class="post-2334 post type-post status-publish format-standard has-post-thumbnail category-blog-basics entry" itemscope="itemscope" itemtype="http://schema.org/BlogPosting" itemprop="blogPost"><header class="entry-header"><h1 class="entry-title" itemprop="headline">Examples of Blogs</h1> 
<p class="entry-meta">by <span class="entry-author" itemprop="author" itemscope="itemscope" itemtype="http://schema.org/Person"><span class="entry-author-name" itemprop="name">Kenneth Byrd</span></span> | Go from 0 to 5,000 blog subscribers in 60 days <a href="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/" rel="nofollow">(Click Here)</a></p></header><img src="http://blogbasics.com/wp-content/uploads/Examples-of-Blogs.jpg" width="5315" height="3543" alt="examples of blogs" title="" class="attachment-tru-post wp-post-image" /><div class="entry-content" itemprop="text"><h3>Overview</h3>
<p>This article includes examples of blogs from various niches. There are millions of example blogs out there in all different shapes and sizes. A good place to start is <a href="http://technorati.com/" target="_blank">Technorati</a>, a directory of blogs, or <a href="http://alltop.com/" target="_blank">Alltop</a>. Search these websites and then come back and tell us about the good blogs and the bad blogs that you found. Below are also more examples of blogs that you should look at:</p>
<h3><strong>Personal blogs</strong></h3>
<p><a title="Curl Centric" href="http://www.curlcentric.com/natural-hair-101/" target="_blank">Curl Centric</a>: Dedicated to providing healthy hair care information.</p>
<h3>Travel</h3>
<p><a href="http://boardingarea.com/" target="_blank">Boarding Area</a>:  A collection of bloggers on travel.  Range from personal stories to specific advice on airlines, hotels and places.</p>
<div class="content-box-yellow"><a href="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/">Learn how to start a blog that generates traffic, revenue, &amp; popularity.</a></div>
<p><a href="http://vivisrandomramblings.blogspot.com/" target="_blank">Vivi&#8217;s Random Ramblings</a>: A nice collection of random posts mostly demonstrating that Violy is a well-travelled, excellent photographer.</p>
<!-- Quick Adsense WordPress Plugin: http://quicksense.net/ -->
<div style="float:none;margin:5px 0 5px 0;text-align:center;">
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- Blog Basics - 300 x 250 -->
<ins class="adsbygoogle"
     style="display:inline-block;width:300px;height:250px"
     data-ad-client="ca-pub-5556427932737077"
     data-ad-slot="6553509385"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div>

<p><a href="http://www.whygo.com/" target="_blank">Why go network of blogs</a>: Another group of travel bloggers.  Each blogger has their own patch, which range from Portland, which looks a nice city, to Iceland and France.</p>
<h3>Technical</h3>
<p><a href="http://techcrunch.com/" target="_blank">Techcrunch</a>:  This is the one to learn all about technology and in particular technology business, technology start-ups and gadgets.  You&#8217;ll usually hear the techie gossip here first.</p>
<p><a href="http://speckyboy.com/2010/02/25/50-amazing-personal-blog-web-designs/" target="_blank">Speckyboy.com</a>: Great blog on the design of websites.  Good on lists, (usually 50) of well researched examples of good or unusual design.  Gives even the least technical good ideas to discuss with their own designers.</p>
<h3>On Blogging</h3>
<p><a href="http://www.trafficgenerationcafe.com/" target="_blank">Traffic Generation Cafe</a>: Ana Hoffman&#8217;s very friendly, very knowledgeable blog on building traffic for your blog.</p>
<div class="content-box-yellow"><a href="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/">Learn how to start a blog that generates traffic, revenue, &amp; popularity.</a></div>
<p><a href="http://blogbasics.com/blog/" target="_blank">Blog Basics</a>: This website is a blog that is focused on topics like &#8216;how to blog&#8217; and &#8216;how to make money blogging&#8217;.</p>
<h3>Over to you</h3>
<p>Which blogs do you like?  Are you writing a blog?  Then tell us about it.</p>

<!-- Quick Adsense WordPress Plugin: http://quicksense.net/ -->
<div style="float:none;margin:5px 0 5px 0;text-align:center;">
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- Banner -->
<ins class="adsbygoogle"
     style="display:inline-block;width:468px;height:60px"
     data-ad-client="ca-pub-5556427932737077"
     data-ad-slot="1983708988"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div>

<div style="font-size:0px;height:0px;line-height:0px;margin:0;padding:0;clear:both"></div><div style="clear:both;"></div><div id='ois-1' class='ois-design' ><div class="ois-outer ois-8-outer">
    <div class="ois-8-call-top"></div>
    <div class="ois-8-inner ois-inner">
        <div class="col-md-7 ois-8-left">
            <div class="ois-8-title">Get Exclusive Tips</div>
            <div class="ois-8-subtitle">Instantly discover how you can start a blog that generates traffic and income when you join the Blog Basics Tribe (It’s Free). Here's your chance. Just type in your email address.</div>
        </div> <!-- .span7 left side -->    
        <div class="col-md-5 ois-8-right">
            <div class="ois-8-img-wrapper">
                <img src="data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==" data-src="https://lh3.ggpht.com/wMcDasimpny0oKmsIxI0xowRIxbFpaa9Rjg3aAUMG8UUtp4XamG03gYcGlsXTRmvkFqgySaihhq2_KCSr8cN7Q=s0" class="ois-img ois-8-img" /><noscript><img src="https://lh3.ggpht.com/wMcDasimpny0oKmsIxI0xowRIxbFpaa9Rjg3aAUMG8UUtp4XamG03gYcGlsXTRmvkFqgySaihhq2_KCSr8cN7Q=s0" class="ois-img ois-8-img" /></noscript>
            </div>
            <div class="ois-8-form">
                <form action="http://www.aweber.com/scripts/addlead.pl" method="post" id="ois-form-1" data-service="aweber" ><div id="ois-8-email-input-wrapper">
    <input type="text" name="email" class="ois-8-email-input ois-email-input ois-form-control" placeholder="Your Email"/>
</div>
<div id="ois-8-button-wrapper">
    <input type="submit" class="ois-btn ois-8-button" value="Submit"/>
</div><input type='hidden' name='listname' value='awlist3567293'/>
<input type='hidden' name='meta_message' value='1'/>
<input type='hidden' name='redirect' value='http://www.aweber.com/thankyou-coi.htm?m=video&e=example%40example.com&name=Example%20Subscriber&l=awlist3567293'/>
</form>
            </div> <!-- #ois-8-form -->
        </div><!-- .right .col-md-5 right side-->
        <div style="clear:both"></div>
    </div> <!-- inner -->
</div> <!-- outer --></div></div>
<div class="spyr_sliding_share">
    <div class="spyr_sliding_share_text">Share this article</div>
    <div class="spyr_sliding_share_wrap">
            <div class="spyr_sliding_share_button spyr_sb_facebook">
                <a href="#" class="icon icon-facebook"><span>Facebook</span></a>
                <div class="spyr_sb_inner"><div class="fb-like" data-href="http://blogbasics.com/examples-of-blogs/" data-send="false" data-layout="button_count" data-width="100" data-show-faces="false"></div></div>
                </div>
            <div class="spyr_sliding_share_button spyr_sb_twitter">
                <a href="#" class="icon icon-twitter"><span>Twitter</span></a>
                <div class="spyr_sb_inner"><a href="https://twitter.com/share" class="twitter-share-button" data-url="http://blogbasics.com/examples-of-blogs/" data-text="Examples of Blogs | Blog Basics" data-via="kbyrdjr">Tweet</a></div>
                </div>
            <div class="spyr_sliding_share_button spyr_sb_gplus">
                <a href="#" class="icon icon-gplus"><span>Google+</span></a>
                <div class="spyr_sb_inner"><div class="g-plusone" data-size="medium" data-href="http://blogbasics.com/examples-of-blogs/"></div></div>
                </div>
            <div class="spyr_sliding_share_button spyr_sb_pinterest">
                <a href="#" class="icon icon-pinterest"><span>Pinterest</span></a>
                <div class="spyr_sb_inner"><a href="http://pinterest.com/pin/create/button/?url=http://blogbasics.com/examples-of-blogs/&media=http://blogbasics.com/wp-content/uploads/Examples-of-Blogs-550x367.jpg&description=Examples of Blogs" class="pin-it-button" count-layout="horizontal"><img border="0" src="//assets.pinterest.com/images/PinExt.png" title="Pin It" /></a></div>
                </div>
            <div class="spyr_sliding_share_button spyr_sb_mail">
                <a href="#" class="icon icon-mail"><span>Email a Friend</span></a>
                <div class="spyr_sb_inner"><a href="mailto:?subject=Examples of Blogs&body=I found value in this and I think you will too.%0A%0AExamples of Blogs: http://blogbasics.com/examples-of-blogs/">Email a Friend</a></div>
                </div>
        </div>
    <div class="clear"></div>
    </div><footer class="entry-footer"></footer></article><div class="entry-comments" id="comments"><h3>Comments</h3><ol class="comment-list">
    <li class="comment even thread-even depth-1" id="comment-261">
    <article itemprop="comment" itemscope="itemscope" itemtype="http://schema.org/UserComments">


        <header class="comment-header">
            <p class="comment-author" itemprop="creator" itemscope="itemscope" itemtype="http://schema.org/Person">
                <img alt='' src="data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==" data-src="http://0.gravatar.com/avatar/9e0fce9a7b5f7de7f6c528f448007f08?s=48&#038;d=mm&#038;r=g" srcset='http://0.gravatar.com/avatar/9e0fce9a7b5f7de7f6c528f448007f08?s=96&amp;d=mm&amp;r=g 2x' class='avatar avatar-48 photo' height='48' width='48' /><noscript><img alt='' src="http://0.gravatar.com/avatar/9e0fce9a7b5f7de7f6c528f448007f08?s=48&#038;d=mm&#038;r=g" srcset='http://0.gravatar.com/avatar/9e0fce9a7b5f7de7f6c528f448007f08?s=96&amp;d=mm&amp;r=g 2x' class='avatar avatar-48 photo' height='48' width='48' /></noscript><span itemprop="name">violy</span> <span class="says">says</span>            </p>

            <p class="comment-meta">
                <time class="comment-time" datetime="2012-01-09T04:42:00+00:00" itemprop="commentTime"><a href="http://blogbasics.com/examples-of-blogs/#comment-261" class="comment-time-link" itemprop="url">January 9, 2012 at 4:42 am</a></time>            </p>
        </header>

        <div class="comment-content" itemprop="commentText">

            <p>Hi sir thank you so much for the nice compliment about my blog (Vivi&#8217;s Random Ramblings&#8221;), I&#8217;m blogging for not even 2 months now and it&#8217;s really overwhelming to see this compliment and getting a lot of good feedback  too and traffic which is a real surprise .. thank you so much!! &#8211; violy</p>
        </div>

        <div class="comment-reply"><a rel='nofollow' class='comment-reply-link' href='#comment-261' onclick='return addComment.moveForm( "comment-261", "261", "respond", "2334" )' aria-label='Reply to violy'>Reply</a></div>

    </article>
    <ul class="children">

    <li class="comment odd alt depth-2" id="comment-262">
    <article itemprop="comment" itemscope="itemscope" itemtype="http://schema.org/UserComments">


        <header class="comment-header">
            <p class="comment-author" itemprop="creator" itemscope="itemscope" itemtype="http://schema.org/Person">
                <img alt='' src="data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==" data-src="http://1.gravatar.com/avatar/4383ac0510061b683651a0eca3d58e42?s=48&#038;d=mm&#038;r=g" srcset='http://1.gravatar.com/avatar/4383ac0510061b683651a0eca3d58e42?s=96&amp;d=mm&amp;r=g 2x' class='avatar avatar-48 photo' height='48' width='48' /><noscript><img alt='' src="http://1.gravatar.com/avatar/4383ac0510061b683651a0eca3d58e42?s=48&#038;d=mm&#038;r=g" srcset='http://1.gravatar.com/avatar/4383ac0510061b683651a0eca3d58e42?s=96&amp;d=mm&amp;r=g 2x' class='avatar avatar-48 photo' height='48' width='48' /></noscript><span itemprop="name"><a href="http://blogbasics.com" class="comment-author-link" rel="external nofollow" itemprop="url">Paul Odtaa</a></span> <span class="says">says</span>            </p>

            <p class="comment-meta">
                <time class="comment-time" datetime="2012-01-09T09:44:00+00:00" itemprop="commentTime"><a href="http://blogbasics.com/examples-of-blogs/#comment-262" class="comment-time-link" itemprop="url">January 9, 2012 at 9:44 am</a></time>            </p>
        </header>

        <div class="comment-content" itemprop="commentText">

            <p>Hi Violy, </p>
<p>I really like your blog and your photography is great. </p>
        </div>

        <div class="comment-reply"><a rel='nofollow' class='comment-reply-link' href='#comment-262' onclick='return addComment.moveForm( "comment-262", "262", "respond", "2334" )' aria-label='Reply to Paul Odtaa'>Reply</a></div>

    </article>
    </li><!-- #comment-## -->
</ul><!-- .children -->
</li><!-- #comment-## -->

    <li class="comment even thread-odd thread-alt depth-1" id="comment-270">
    <article itemprop="comment" itemscope="itemscope" itemtype="http://schema.org/UserComments">


        <header class="comment-header">
            <p class="comment-author" itemprop="creator" itemscope="itemscope" itemtype="http://schema.org/Person">
                <img alt='' src="data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==" data-src="http://1.gravatar.com/avatar/ad28f49a806d0e3abd1a94013a259b9b?s=48&#038;d=mm&#038;r=g" srcset='http://1.gravatar.com/avatar/ad28f49a806d0e3abd1a94013a259b9b?s=96&amp;d=mm&amp;r=g 2x' class='avatar avatar-48 photo' height='48' width='48' /><noscript><img alt='' src="http://1.gravatar.com/avatar/ad28f49a806d0e3abd1a94013a259b9b?s=48&#038;d=mm&#038;r=g" srcset='http://1.gravatar.com/avatar/ad28f49a806d0e3abd1a94013a259b9b?s=96&amp;d=mm&amp;r=g 2x' class='avatar avatar-48 photo' height='48' width='48' /></noscript><span itemprop="name"><a href="http://allisondduncan.com" class="comment-author-link" rel="external nofollow" itemprop="url">Allison Duncan</a></span> <span class="says">says</span>            </p>

            <p class="comment-meta">
                <time class="comment-time" datetime="2012-01-20T21:17:00+00:00" itemprop="commentTime"><a href="http://blogbasics.com/examples-of-blogs/#comment-270" class="comment-time-link" itemprop="url">January 20, 2012 at 9:17 pm</a></time>           </p>
        </header>

        <div class="comment-content" itemprop="commentText">

            <p>Hi there,</p>
<p>Thanks for featuring my blog on your site. It&#8217;s always nice to see your work being appreciated and linked to.</p>
<p>I look forward to seeing what your site has coming down the pike.</p>
<p>Thanks for reading!</p>
<p>Allison</p>
        </div>

        <div class="comment-reply"><a rel='nofollow' class='comment-reply-link' href='#comment-270' onclick='return addComment.moveForm( "comment-270", "270", "respond", "2334" )' aria-label='Reply to Allison Duncan'>Reply</a></div>

    </article>
    </li><!-- #comment-## -->

我正在使用jsoup库来解析HTML并将其解压缩。我正在尝试使用以下代码:

doc = Jsoup.connect("http://blogbasics.com/examples-of-blogs/").get();

            Elements links = doc.select("itemtype > [itemprop]");

            for (Element element : links) {
                System.out.println(" itemprop :"+element.attr("itemprop"));
            }
        } catch (IOException e) {
            e.printStackTrace();
        }

但我得到的是空值。我是这项工作的新手,请让我知道正确的代码。如果从HTML中提取itemtypeitemprop的任何其他方式,请分享它将会很有帮助。

<div class="content-sidebar-wrap">
<main class="content" role="main" itemprop="mainContentOfPage" itemscope="itemscope" 
itemtype="http://schema.org/Blog"><article class="post-2334 post type-post status-publish 
format-standard has-post-thumbnail category-blog-basics entry" itemscope="itemscope" 
itemtype="http://schema.org/BlogPosting" itemprop="blogPost"><header class="entry-header">
<h1 class="entry-title" itemprop="headline">Examples of Blogs</h1> 
<p class="entry-meta">by <span class="entry-author" itemprop="author" itemscope="itemscope" 
itemtype="http://schema.org/Person"><span class="entry-author-name" itemprop="name">Kenneth Byrd</span></span> |
 Go from 0 to 5,000 blog subscribers in 60 days
 <a href="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/" rel="nofollow">(Click Here)</a>
 </p></header><img src="http://blogbasics.com/wp-content/uploads/Examples-of-Blogs.jpg" width="5315" height="3543" 
 alt="examples of blogs" title="" class="attachment-tru-post wp-post-image" /><div class="entry-content"
 itemprop="text"><h3>Overview</h3><p>This article includes examples of blogs
 from various niches. There are millions of example blogs out there in all 
 different shapes and sizes. A good place to start is 
 </p>

预期产出

itemtype="http://schema.org/Blog">
itemprop="mainContentOfPage"

itemtype="http://schema.org/BlogPosting" 
itemprop="blogPost"

itemtype="http://schema.org/Person"
itemprop="author"
itemprop="name">
itemprop="text"

1 个答案:

答案 0 :(得分:1)

我不确定您真正想要的是什么,但似乎您需要将包含属性itemtype的所有元素与属性itemprop或仅包含itemprop的元素相关联但是包含itemtype的元素的直接子元素。如果是这种情况,那么您可以使用:

String html = ""
        +"<div class=\"content-sidebar-wrap\">"
        +"<main class=\"content\" role=\"main\" itemprop=\"mainContentOfPage\" itemscope=\"itemscope\" "
        +"itemtype=\"http://schema.org/Blog\"><article class=\"post-2334 post type-post status-publish "
        +"format-standard has-post-thumbnail category-blog-basics entry\" itemscope=\"itemscope\" "
        +"itemtype=\"http://schema.org/BlogPosting\" itemprop=\"blogPost\"><header class=\"entry-header\">"
        +"<h1 class=\"entry-title\" itemprop=\"headline\">Examples of Blogs</h1> "
        +"<p class=\"entry-meta\">by <span class=\"entry-author\" itemprop=\"author\" itemscope=\"itemscope\" "
        +"itemtype=\"http://schema.org/Person\"><span class=\"entry-author-name\" itemprop=\"name\">Kenneth Byrd</span></span> |"
        +" Go from 0 to 5,000 blog subscribers in 60 days"
        +" <a href=\"https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/\" rel=\"nofollow\">(Click Here)</a>"
        +" </p></header><img src=\"http://blogbasics.com/wp-content/uploads/Examples-of-Blogs.jpg\" width=\"5315\" height=\"3543\" "
        +" alt=\"examples of blogs\" title=\"\" class=\"attachment-tru-post wp-post-image\" /><div class=\"entry-content\""
        +" itemprop=\"text\"><h3>Overview</h3><p>This article includes examples of blogs"
        +" from various niches. There are millions of example blogs out there in all "
        +" different shapes and sizes. A good place to start is "
        +" </p>"
        ;

Document doc = Jsoup.parse(html,"");

Elements els = doc.select("*[itemtype][itemprop], *[itemtype] > *[itemprop]");
for (Element el:els){

    System.out.print(el.attr("itemtype").isEmpty()?"":("\n" +el.attr("itemtype")+"\n"));
    System.out.println(el.attr("itemprop"));
}

重要的部分是 JSoup CSS selector *[itemtype][itemprop], *[itemtype] > *[itemprop],它有两部分:

  1. *[itemtype][itemprop]选择具有这两个属性的元素。

  2. *[itemtype] > *[itemprop]选择属性为itemprop的元素,这些元素是属性为itemtype的元素的直接子元素。如果您想允许所有孩子,不仅仅是直接孩子,那么请忽略>

  3. 选择器之间的逗号作为“OR”,因此将返回与列出的任何选择器匹配的所有元素。