我有一个需要解析的页面:
<div class="shadowBox someOtherBox">
.
.
.
</div>
.
.
.
<div class="shadowBox other">
<h2>OTHERS</h2>
<ul>
<li>
<a href="/link/to/something/1" target="_self">TITLE #1</a>
</li>
<li>
<a href="/link/to/something/2" target="_self">TITLE #2</a>
</li>
<li>
<a href="/link/to/something/3" target="_self">TITLE #3</a>
</li>
</ul>
</div>
我想在<div class="shadowBox other">
及其标题内获取每个链接。我试图以多种不同的方式做到这一点,但最后我无法做到这一点。以下是我尝试之一的代码;
function parse(crn)
{
request("LINK_OF_PAGE", function(error, response, html)
{
if(!error)
{
var $ = cheerio.load(html);
var title, news_url, url_hash;
var json = { title : "", news_url : ""};
var links = [];
var data = $('div').filter('.shadowBox').last();
//var data = $('.shadowBox.other').children('ul').children('li').children('a');
console.log(data);
news_url = data.prev().text();
url_hash = md5(news_url);
}
});
}
为什么我的逻辑不起作用?我如何实现我的目标?
答案 0 :(得分:0)
看起来你正试图用锚elemnets的href和text值填充links数组
var links = $('.shadowBox.other li a').map(function(){
var $this = $(this);
return { title : $this.attr('href'), news_url : $this.text()}
}).get();