使用jQuery遍历DOM

时间:2015-04-17 18:15:56

标签: javascript jquery html css web-scraping

我试图为自己的一个项目搜索一个歌词网站,并且我遇到了一些jQuery问题。我能够获得艺术家和歌曲的名称,但是歌词被封装在这个div中,这很难从中提取数据。我发布了HTML示例,我的代码以及正在记录的for循环中的一次迭代。

基本上,我试图使用内联样式来拉入div中的所有歌词,但是我打印到控制台的对象采用我在下面显示的对象的形式。我想我可以在map函数中做(this.prev()。data()),但它似乎不起作用。任何关于如何以正确的方式解析这个问题的见解或参考将非常感谢..!

谢谢!

HTML

<div id="main">
<div class="...">...</div>
<h2>ARTIST</h2>
<div class="...">...</div>
<b>"SONG"</b>
<br>
<br>
<div style="margin-left:10px;margin-right:10px;">
    <!--start of lyrics -->
    "
    lyric1"
    <br>
    "
    lyric2"
    <br>
    "
    lyric3"
    <br>
    "lyric4"
    etc...
    <!-- end of lyrics -->
</div>

CODE

    request(url, function(error, response, html){
    if(!error){
        var $ = cheerio.load(html);
        var artist, song, lyrics;
        var json = { artist : "", song : "", lyrics : []};

        $('#main').filter(function(){
            var data = $(this);
            title = data.find('h2').text().replace(' LYRICS','');
            artist = data.find('b').text().replace(/["]+/g, '');
            var lines = data.children().eq(6).children().map(function() {
                console.log(this)
                console.log("<<<<<<<<<<<<<<<<<<<<<<<<<<<<")
            });
        })
    }
})

})

console.log打印的地图功能中的项目

{ type: 'tag',
      name: 'br',
      attribs: {},
      children: [],
      next:
       { data: '\nI remember you was conflicted, misusing your influence\r\n',
         type: 'text',
         next:
          { data: ' end of lyrics ',
            type: 'comment',
            next: [Object],
            prev: [Circular],
            parent: [Object] },
         prev: [Circular],
         parent:
          { type: 'tag',
            name: 'div',
            attribs: [Object],
            children: [Object],
            next: [Object],
            prev: [Object],
            parent: [Object] } },
      prev:
       { data: '\nWe want the funk',
         type: 'text',
         next: [Circular],
         prev:
          { type: 'tag',
            name: 'br',
            attribs: {},
            children: [],
            next: [Circular],
            prev: [Object],
            parent: [Object] },
         parent:
          { type: 'tag',
            name: 'div',
            attribs: [Object],
            children: [Object],
            next: [Object],
            prev: [Object],
            parent: [Object] } },
      parent:
       { type: 'tag',
         name: 'div',
         attribs: { style: 'margin-left:10px;margin-right:10px;' },
         children:
          [ bunch of objects within arrays and one [Circular] ]
         next:
          { data: '\r\n\r\n',
            type: 'text',
            next: [Object],
            prev: [Circular],
            parent: [Object] },
         prev:
          { data: '\r\n\r\n',
            type: 'text',
            next: [Circular],
            prev: [Object],
            parent: [Object] },
         parent:
          { type: 'tag',
            name: 'div',
            attribs: [Object],
            children: [Object],
            next: [Object],
            prev: [Object],
            parent: [Object] } } }

1 个答案:

答案 0 :(得分:0)

您希望使用contents()来获取文本节点,并且您希望使用map()将其转换为数组格式。返回null会排除数组中的行。

var elems = $('#main').children().eq(6).contents().clone();
elems.find("br").remove();
var text = elems.map( function () { 
    var ln = $.trim($(this).text().replace(/["\\n]/g,"")); 
    return (ln.length) ? ln : null;
}).get();
console.log(text);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script>
<div id="main">
  <div class="...">...</div>
  <h2>ARTIST</h2>
  <div class="...">...</div>
  <b>"SONG"</b>
  <br>
  <br>
  <div style="margin-left:10px;margin-right:10px;">
    <!--start of lyrics -->
    "
    lyric1"
    <br>
    "
    lyric2"
    <br>
    "
    lyric3"
    <br>
    "lyric4"
    <!-- end of lyrics -->
  </div>
</div>