为什么puppeter似乎在随机化数据?

时间:2020-11-04 00:13:14

标签: web-scraping puppeteer

我正在尝试抓取网站,但抓取工具似乎正在随机分配我返回的数据。有时它将提供我所要求的所有数据,有时却不提供。在我的价格评估中,有时它会提供正确的数据,但其他时候返回的是不确定的。

    import puppeteer from "puppeteer"
    import useAddFirestore from "../hooks/useAddFirestore.js"

    export default async function nikeScraper(date){
        const browser = await puppeteer.launch({
            headless: false
        });
        const page = await browser.newPage();
        await page.setDefaultNavigationTimeout(0);
        await page.goto("https://www.nike.com/w/sale-shoes-3yaepzy7ok");

        const nikeData = []

        const titles = await page.evaluate(() => {
            const titles = document.querySelectorAll(".product-card__title")
            const titleList = [...titles]
            const text = titleList.map(title => title.innerText)
            return text
        })
        titles.forEach((el, i) => {
            nikeData[i] = {}
            nikeData[i].title = el
            nikeData[i].date = date
            nikeData[i].brand = "Nike"
        })

        const links = await page.evaluate(() => {
            const links = document.querySelectorAll(".product-card__img-link-overlay")
            const linksList = [...links]
            const href = linksList.map(link => link.href)
            return href
        })
        links.forEach((el, i) => {
            nikeData[i].link = el
        })

        const prices = await page.evaluate(() => {
            const prices = document.querySelectorAll(".product-price__wrapper")
            const priceList = [...prices]
            const text = priceList.map(price => price.innerText)
            return text
        })
        prices.forEach((el, i) => {
            const splitEl = el.split("\n")
            nikeData[i].sale = splitEl[0]
            nikeData[i].retail = splitEl[1]
        })

        const images = await page.evaluate(() => {
            const images = document.querySelectorAll("img")
            const imageList = [...images]
            const src = imageList.map(img => img.src).filter(src => src.includes("static.nike.com"))
            return src
        })
        images.forEach((el, i) => {
            nikeData[i].image = el
        })

        await browser.close();

        for(let entry of nikeData){
            useAddFirestore(entry)
        }
    }

我为另一个网站做了几乎相同的刮刀,并且每次都能用,所以我不知道为什么这不起作用。

示例数据返回

        {
        title: 'ZX 2K 4D SHOES',
        brand: 'Adidas',
        image: 'https://assets.adidas.com/images/w_385,h_385,f_auto,q_auto:sensitive,fl_lossy/d071967e4a624b11a32eabb300e7a801_9366/zx-2k-4d-shoes.jpg',
        sale: '',
        retail: undefined
    }

0 个答案:

没有答案