我正在尝试从分页信息的网站获取信息。 下面的代码工作并从第一页的列表中的每个项目收集标题。然后将其保存为json文件。 问题是底部的不同页面链接如下所示:
<a href="javascript:__doPostBack('some data1..','')">1</a>
<a href="javascript:__doPostBack('some data2..','')">2</a>
我想做的是:
1. Load the first url
2. Click each pagination link at the bottom to visit that page (Eg. pages 1,2,3,4,5)
2a. On each page, I want to collect the information like I've done in the script below.
3. This can then either be saved per page or together in a json file like in the code below.
解决方案应解决如何加载页面,然后单击多个链接并在每个页面收集信息。
var Nightmare = require('nightmare');
var nightmare = Nightmare({ show: true });
var fs = require('fs');
var config = require('./config.json');
nightmare
.goto('some url...')
.wait('table.gridlist')
.inject('.js','jquery.js')
.evaluate(function(){
var json = [];
$('table.gridlist tr.listitem-even').each(function() {
var $tds = $(this).find('td');
if($tds.length) {
var item = {title:''};
item.title = $tds.eq(1).find('a').eq(0).text();
json.push(item);
}
});
return json;
})
.end()
.then(function (result) {
var dt = new Date();
var time = dt.getHours() + "-" + dt.getMinutes() + "-" + dt.getSeconds();
var filename = config.base_path+'files/'+time+'.json'
fs.writeFile(filename, JSON.stringify(result), function(err) {
if (err)
return console.log(err);
console.log('Saved json data to '+filename);
});
console.log(result)
})
.catch(function (error) {
console.error('Search failed:', error);
});