遍历html文件以获得href

时间:2019-11-03 10:40:01

标签: javascript html node.js cheerio

我的html文件如下

<div id="sidebar" style="top: 100px;">
    <div class="items">
        <div class="item hentry selected" itemscope="" itemtype="http://schema.org/BlogPosting" data-id="3714235398193725034">

            <img class="thumbnail" src="http://4.bp.blogspot.com/-FLnjwm6youQ/UUGhQei8KqI/AAAAAAAAAUE/nEl-5V5IcDw/s30-p/1.jpg" style="width: 30px; height: 30px;">

            <h3 class="title entry-title" itemprop="name">


    <a href="http://mywebsiteurl/2013/03/blog-post.html" rel="bookmark" itemprop="url">art1</a>

  </h3>

        </div>
        <div class="item hentry" itemscope="" itemtype="http://schema.org/BlogPosting" data-id="179325489509322215">
.
.
.
      </div>
  </div>
</div>

HTML的ID为侧边栏的div

在另一个div类别项下

在有多个div类别项的情况下

在每个带有班级项目的div下,我都有一个带有班级标题的h3

在h3标签下,我有一个'a'标签

我需要获取所有带有类item的div下的'a'标签的href值。

在此方面,我将提供一些帮助。

谢谢

3 个答案:

答案 0 :(得分:1)

尝试使用内联jQuery:

$.each($("#sidebar .items .item h3 a"),function(a,b){console.log($(b).attr("href"));});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<div id="sidebar" style="top: 100px;">
    <div class="items">
        <div class="item hentry selected" itemscope="" itemtype="http://schema.org/BlogPosting" data-id="3714235398193725034">

            <img class="thumbnail" src="http://4.bp.blogspot.com/-FLnjwm6youQ/UUGhQei8KqI/AAAAAAAAAUE/nEl-5V5IcDw/s30-p/1.jpg" style="width: 30px; height: 30px;">

            <h3 class="title entry-title" itemprop="name">


    <a href="http://mywebsiteurl/2013/03/blog-post.html" rel="bookmark" itemprop="url">art1</a>

  </h3>

        </div>
        <div class="item hentry" itemscope="" itemtype="http://schema.org/BlogPosting" data-id="179325489509322215">
           <img class="thumbnail" src="http://4.bp.blogspot.com/-FLnjwm6youQ/UUGhQei8KqI/AAAAAAAAAUE/nEl-5V5IcDw/s30-p/1.jpg" style="width: 30px; height: 30px;">

            <h3 class="title entry-title" itemprop="name">


    <a href="http://example.com" rel="bookmark" itemprop="url">art2</a>

  </h3>
      </div>
  </div>
</div>

答案 1 :(得分:0)

您可以首先使用getElementsByClassName获取具有class item的所有div,然后使用getElementsByTagName为每个div查找该div下的所有锚标记。

const itemDivs = [...document.getElementsByClassName('item')];

const hrefs = [];
itemDivs.forEach(div => {
    const anchors = [...div.getElementsByTagName('a')];
    if (anchors && anchors.length > 0) {
        anchors.forEach(a => hrefs.push(a.href));
    }
});

console.log(hrefs); // prints ["http://mywebsiteurl/2013/03/blog-post.html"]

答案 2 :(得分:0)

您可以尝试使用DOMParser api

let html = `<div id="sidebar" style="top: 100px;">
    <div class="items">
        <div class="item hentry selected" itemscope="" itemtype="http://schema.org/BlogPosting" data-id="3714235398193725034">
            <img class="thumbnail" src="http://4.bp.blogspot.com/-FLnjwm6youQ/UUGhQei8KqI/AAAAAAAAAUE/nEl-5V5IcDw/s30-p/1.jpg" style="width: 30px; height: 30px;">
            <h3 class="title entry-title" itemprop="name">
    <a href="http://mywebsiteurl/2013/03/blog-post.html" rel="bookmark" itemprop="url">art1</a>
  </h3>
        </div>
        <div class="item hentry" itemscope="" itemtype="http://schema.org/BlogPosting" data-id="179325489509322215">
      </div>
  </div>
  <div class = 'item'>
   <a  href='http://example1.com'/> 
  </div>
  <div class = 'noitem'>
   <a  href='http://example2.com'/> 
  </div>
</div>`

let parser = new DOMParser()
let parsed = parser.parseFromString(html, 'text/html')

let anchors = [...parsed.querySelectorAll('.item > a')]

let hrefs = anchors.map(v=> v.href)

console.log(hrefs)