Question

我必须下载网址的HTML内容。问题是URL需要一些时间来加载，所以我必须在记录内容之前等待/超时一段时间（~10 - 15秒）。为了实现这一点，我尝试了两种方法，但所有方法都无法产生预期的结果。

第一种方法是使用setTimeOut：

var page = require('webpage').create()
page.open(url, function (status) {
    if (status !== 'success') {
        console.log('Unable to load the address!');
        phantom.exit();
    } else {
        window.setTimeout(function () {
            console.log(page.content);
            phantom.exit();
        }, 10000);  
    }
});

但是setTimeout无法设置指定的超时。无论我将什么值作为Timeout，它都会在一段固定的时间后超时，这个时间小于页面加载时间。

第二种方法是使用OnLoadFinished：

var page = new WebPage(), testindex = 0, loadInProgress = false;

page.onConsoleMessage = function(msg) {
    console.log(msg)
};

page.onLoadStarted = function() {
    loadInProgress = true;
    console.log("load started");
};

page.onLoadFinished = function() {
    loadInProgress = false;
    console.log("load finished");
};

var steps = [
    function() {
        page.open("url");
    },

    function() {
        console.log(page.content);
    }
];


interval = setInterval(function() {
    if (!loadInProgress && typeof steps[testindex] == "function") {
        console.log("step " + (testindex + 1));
        steps[testindex]();
        testindex++;
    }
    if (typeof steps[testindex] != "function") {
        console.log("test complete!");
        phantom.exit();
    }
}, 5000);

在此方法中，OnLoadFinished在加载整页之前触发。

我是phantomJS的新手，所以上面两个解决方案也来自堆栈溢出。有什么我想念的东西对我来说特别重要吗？有没有其他方法可以达到相同的效果？（我也尝试了Waitfor构造，但没有成功）。

Answer 1

好的，问题是在超时后加载内容。如果您正在寻找DOM元素，则必须使用已知的WaitFor函数。但是如果你只想在超时后获取页面内容，那就容易多了。让我们开始吧。

var page = require("webpage").create();
var address = "http://someadress.com/somepath/somearticle";
var timeout = 10*1000;

page.open(address);


function getContent() {
    return page.evaluate(function() {
        return document.body.innerHTML;
    });
}

page.onLoadFinished = function () {
    setTimeout(function() {
       console.log(getContent());
    }, timeout);

}

注意！如果您正在等待HTML正文中的大型内容，请使用setInterval函数，等待document.body.innerHTML超过您想要的内容。

等待URL下载网页的所有内容

1 个答案: