我正在尝试在网页上抓取一些新闻文章,但我需要单击每个页面。
我写了一些代码来做到这一点,但是问题是我必须等待XHR请求完成才能提取新数据。
单击新页面后,将生成XHR请求。
在纯chrome javascript中,没有硒或nodejs,我如何等待所有chrome xhr请求完成?
// Keep track of how many pages were selected
var pagesScraped = 0;
// Gets the next page on the list of news articles.
function getNextPage() {
var pagination = document.querySelectorAll('#pagination_top > div > ul > li > a');
pagination[pagination.length-1].click();
getNewsArticleTitles();
}
// Extracts the news article titles
function getNewsArticleTitles() {
// Wait for network pending to complete?
var articles = document.querySelectorAll('#hits > div > div > div > div > article > div.fxs_floatingMedia_textBody > h4 > a');
for(var i = 0; i < articles.length; i ++) {
console.log(articles[i].innerHTML);
}
pagesScraped ++;
if (pagesScraped < 68) {
getNextPage();
}
}
答案 0 :(得分:0)
在抓取Web时,我通常会等待DOM中的一些可见更改,并以此作为指示页面已加载并继续的指示。如果您想使用纯js方式,这是一种实现方式
import java.util.Scanner;
public class Prmtrs1
{
public static int user_age2;
public static String user_age;
public static void main(String args[]) {
Scanner input = new Scanner(System.in);
Prmtrs2 POblect = new Prmtrs2 ();
System.out.println("Type in your name in here! ");
String name = input.nextLine();
POblect.Hello(name);
System.out.println("");
System.out.println("Enter your age, " + name);
user_age = input.nextLine();
while(user_age.matches("\\d+")) {
System.out.println("enter numbers from 0 to 9");
user_age = input.nextLine();
}
System.out.println("Your age is " + user_age + ", " + name);
int user_age2 = Integer.parseInt(user_age);
if(user_age2 >= 18)
{
System.out.println("So...");
System.out.println("You are eligible to Vote, " + name + "!");
}
else{
System.out.println("So...");
System.out.println("You are not eligible to vote, " + name + "!");
}
}
}
如果您想等到文档中所有ajax请求都完成后,无论它们中有多少存在,我都认为您可以通过以下方式使用jquery的$ .ajaxStop事件:
function checkIfNewPageLoaded() {
// you can update this method as needed
var pageNo = document.querySelector("span.current_page_number");
var success = pageNo && Number(pageNo.innerText.trim()) > pagesScraped; // meaning wait for page number to incre
ment
return success;
}
var checkExist = setInterval(function() {
if (checkIfNewPageLoaded()) {
console.log("Proceed with your execution of next page");
clearInterval(checkExist);
getNewsArticleTitles(); // get new articles now
}
}, 500); // check every 500ms
我自己还没有测试过ajaxStop,但众所周知它可以工作。但是我无法想到我的第一个解决方案不起作用的任何情况