我正在使用CasperJS抓取网站,其中一项任务涉及在for循环计数器设置的URL上进行爬网。网址看起来像这样
www.example.com/page/no=
其中,no是for循环计数器设置的0到10之间的任何数字。然后,抓取器遍历所有页面,将数据抓取到JSON对象中,并重复直到no = 10。
我要获取的数据存储在每个页面的离散组中-我想通过加入每个页面的所有已抓取输出来使用的是单个JSON对象。
想象一下,第1页有费用1,而我得到的对象是{Expense1},第2页有费用2,而我正在得到的对象是{Expense2}。我希望在抓取末尾有一个JSON,如下所示:
scrapedData = {
"expense1": expense1,
"expense2": expense2,
}
我遇到的麻烦是将所有JSON对象连接到一个数组中。
我初始化了一个空数组,然后将每个对象推送到数组。 我试过检查如果for循环中的迭代器i等于10,那么将打印出JSON对象,但这似乎没有用。我抬起头,似乎可以使用“对象传播”,但是在这种情况下,我不确定如何使用它。
任何指针都会有所帮助。我应该使用任何数组函数(例如map)吗?
casper.then(function(){
var url = "https:example.net/secure/SaFinShow?url=";
//We create a for loop to go open the urls
for (i=0; i<11; i++){
this.thenOpen(url+ i, function(response){
expense_amount = this.fetchText("td[headers='amount']");
Date = this.fetchText("td[headers='Date']");
Location = this.fetchText("td[headers='zipcode']");
id = this.fetchText("td[headers='id']");
singleExpense = {
"Expense_Amount": expense_amount,
"Date": Date,
"Location": Location,
"id": id
};
if (i ===10){
expenseArray.push(JSON.stringify(singleExpense, null, 2))
this.echo(expenseArray);
}
});
};
});
答案 0 :(得分:0)
以您的示例并对其进行扩展,您应该能够执行以下操作:
// Initialize empty object to hold all of the expenses
var scrapedData = {};
casper.then(function(){
var url = "https:example.net/secure/SaFinShow?url=";
//We create a for loop to go open the urls
for (i=0; i<11; i++){
this.thenOpen(url+ i, function(response){
expense_amount = this.fetchText("td[headers='amount']");
Date = this.fetchText("td[headers='Date']");
Location = this.fetchText("td[headers='zipcode']");
id = this.fetchText("td[headers='id']");
singleExpense = {
"Expense_Amount": expense_amount,
"Date": Date,
"Location": Location,
"id": id
};
// As we loop over each of the expenses add them to the object containing all of them
scrapedData['expense'+i] = singleExpense;
});
};
});
此变量运行后,scrapedData
的形式应为:
scrapedData = {
"expense1": expense1,
"expense2": expense2
}
上述代码的一个问题是,在for循环中,当您循环支出时,变量应该是局部的。变量名也不应为Date
和Location
,因为它们是JavaScript中的内置名称。
// Initialize empty object to hold all of the expenses
var scrapedData = {};
casper.then(function(){
var url = "https:example.net/secure/SaFinShow?url=";
//We create a for loop to go open the urls
for (i=0; i<11; i++){
this.thenOpen(url+ i, function(response){
// Create our local variables to store data for this particular
// expense data
var expense_amount = this.fetchText("td[headers='amount']");
// Don't use `Date` it is a JS built-in name
var date = this.fetchText("td[headers='Date']");
// Don't use `Location` it is a JS built-in name
var location = this.fetchText("td[headers='zipcode']");
var id = this.fetchText("td[headers='id']");
singleExpense = {
"Expense_Amount": expense_amount,
"Date": date,
"Location": location,
"id": id
};
// As we loop over each of the expenses add them to the object containing all of them
scrapedData['expense'+i] = singleExpense;
});
};
});