我有一个简单的node.js脚本来捕获一些网页的屏幕截图。看来我在使用async / await的过程中被绊倒了,但我不知道在哪里。我目前正在使用puppeteer v1.11.0。
libunity.a libzmq.a libzmq.so libzmq.so.5 libzmq.so.5.2.2
答案 0 :(得分:4)
当系统磁盘已满时,我遇到了相同的错误。
答案 1 :(得分:3)
Navigation failed because browser has disconnected
错误通常意味着启动Puppeteer的节点脚本结束而没有等待Puppeteer动作完成。因此,正如您所说的,这是一个等待的问题。
关于您的脚本,我进行了一些更改以使其起作用:
1-首先,您不等待stepThru
函数的(异步)结束,所以请更改
stepThru();
到
await stepThru();
和
puppeteer.launch({devtools:false}).then(function(browser){
到
puppeteer.launch({devtools:false}).then(async function(browser){
(我添加了async
)
2-我更改了您管理goto
和pagce.once
承诺的方式
现在PDF承诺
new Promise(async function(resolve, reject){
//screenshot on first console message
page.once("console", async () => {
await page.pdf({path: paper + '.pdf', printBackground:true, width:'1024px', height:'768px', margin: {top:"0px", right:"0px", bottom:"0px", left:"0px"} });
resolve();
});
})
它只负责PDF的创建。
3-然后我用page.goto
Promise.all
和PDF许诺
await Promise.all([
page.goto(url, {"waitUntil":["load", "networkidle2"]}),
new Promise(async function(resolve, reject){
// ... pdf creation as above
})
]);
4-我将page.close
之后的Promise.all
await Promise.all([
// page.goto
// PDF creation
]);
await page.close();
resolve();
现在就可以了,这里是完整的工作脚本
const puppeteer = require('puppeteer');
//a list of sites to screenshot
const papers =
{
nytimes: "https://www.nytimes.com/",
wapo: "https://www.washingtonpost.com/"
};
//launch puppeteer, do everything in .then() handler
puppeteer.launch({devtools:false}).then(async function(browser){
//create a load_page function that returns a promise which resolves when screenshot is taken
async function load_page(paper){
const url = papers[paper];
return new Promise(async function(resolve, reject){
const page = await browser.newPage();
await page.setViewport({width:1024, height: 768});
await Promise.all([
page.goto(url, {"waitUntil":["load", "networkidle2"]}),
new Promise(async function(resolve, reject){
//screenshot on first console message
page.once("console", async () => {
await page.pdf({path: paper + '.pdf', printBackground:true, width:'1024px', height:'768px', margin: {top:"0px", right:"0px", bottom:"0px", left:"0px"} });
resolve();
});
})
]);
await page.close();
resolve();
})
}
//step through the list of papers, calling the above load_page()
async function stepThru(){
for(var p in papers){
if(papers.hasOwnProperty(p)){
//wait to load page and screenshot before loading next page
await load_page(p);
}
}
await browser.close();
}
await stepThru();
});
请注意:
-我将networkidle0
更改为networkidle2
是因为nytimes.com网站花费很长时间才能获得0网络请求状态(由于AD等)。您显然可以等待networkidle0
,但这取决于您,这超出了您的问题范围(在这种情况下,增加page.goto
的超时时间)
-www.washingtonpost.com
站点出现TOO_MANY_REDIRECTS
错误,因此我更改为washingtonpost.com
,但我认为您应该对此进行更多调查。为了测试脚本,我在nytimes
网站和其他网站上使用了更多的时间。再说一次:这超出了您的问题范围
让我知道是否需要更多帮助