我使用我要爬网的URL设置了一个爬网程序,actor正在工作,我使用Cookie /屏幕截图示例进行了测试。我只是在从演员传递cookie到爬网时遇到问题:
const Apify = require('apify');
Apify.main(async () => {
const input = await Apify.getValue('INPUT');
const browser = await Apify.launchPuppeteer();
const page = await browser.newPage();
await page.goto('http://xy.com/login');
// Login
await page.type('#form_user_login_email', input.username);
await page.type('#form_user_login_password', input.password);
await page.evaluate(() => { document.querySelectorAll('.btn-full-width')[1].click(); });
await page.waitForNavigation();
// Get cookies
const cookies = await page.cookies();
// Use cookies in other tab or browser
//const page2 = await browser.newPage();
//await page2.setCookie(...cookies);
// Get cookies after login
const apifyClient = Apify.client;
// call crawler with cookies
const execution = await apifyClient.crawlers.startExecution({
crawlerId: 'mhi',
settings: {
cookies: cookies
}
});
console.log('Done.');
console.log('Closing Puppeteer...');
await browser.close();
});
我认为cookie没有通过,因为Crawler没有登录。
答案 0 :(得分:0)
您的代码应该可以使用。也许您可以尝试将cookiesPersistence : 'OVER_CRAWLER_RUNS'
设置为设置。如果不确定是否传递了cookie,则可以使用API端点https://api.apify.com/v1/user_id/crawlers/crawler_id?token=api_apify_token&executionId=execution_id
进行检查。
但是您不需要将cookie传递给搜寻器,您可以使用Apify SDK在actor中直接对其进行搜寻。您只需要在设置cookie的PuppeteerCrawler中覆盖goto函数。检查做doc for puppeterCrawler。
const Apify = require('apify');
Apify.main(async () => {
const input = await Apify.getValue('INPUT');
const browser = await Apify.launchPuppeteer();
const page = await browser.newPage();
await page.goto('http://xy.com/login');
// Login
await page.type('#form_user_login_email', input.username);
await page.type('#form_user_login_password', input.password);
await page.evaluate(() => { document.querySelectorAll('.btn-full-width')[1].click(); });
await page.waitForNavigation();
// Get cookies
const cookies = await page.cookies();
const crawler = new Apify.PuppeteerCrawler({
// puppeteer crawler options
gotoFunction: async ({ request, page }) => {
await page.setCookie(cookies);
return page.goto(request.url);
}
});
await crawler.run();
console.log('Done.');
console.log('Closing Puppeteer...');
await browser.close();
});