Cheerio搜寻器运行缓慢

时间:2019-06-24 22:15:01

标签: cheerio

我为Yelp创建了一个搜寻器。它工作得很好,但是很慢。首先,我加载结果页面,然后获取所有列表URL,然后转到每个列表以收集公司详细信息。

我是JavaScript和cheerio的新手,所以也许我做错了事。

// Main function for scraping  
const getDetails = async (url) => {  
try    {
      // Getting the url 
    const response = await axios.get(url);
    const $ = cheerio.load(response.data);

        // Getting the Link of the listing
        $('div.businessName__373c0__1fTgn').map((i, element) =>
        {
            const $element = $(element);
            const $l = $element.find('h3').children('a').attr('href');
            const $listing = `https://www.yelp.com${$l}`;
            console.log(`found ${$listing}`);

            listings.push($listing);
        });

        //Finding the next page link and the cleaning it up
        const nextPageLink =$('.pagination-link-- 
        current__373c0__37ym9').parent().parent().parent().next('div').find('a').attr('href');
        const page = await 'https://www.yelp.com'+nextPageLink;
        console.log(chalk.cyan(`  Scraping: ${page}`));
        pageCounter++;

        // When the pageCounter and pageLimit are equal start scraping for company info
    if (pageCounter == pageLimit);
    {
        await scrapeDetailsPage(listings);
        return false;
    }

    // Passing in the next page link
    getDetails(page);
      } catch (error) {

     scrapeDetailsPage(listings);
    console.log(error);   } }


// Scrape details page 
const scrapeDetailsPage = async (listings) => 
{
    // Iterating through each listing link and scrapng the webpage
    for(var i = 0; i < listings.length; i++)
    {
        // Setting the listing array index to detailsLink
        const detailsLink = listings[i];

        // Scraping for Company details
        try 
        {
            const response = await axios.get(detailsLink);
            const $ = cheerio.load(response.data);

            const $name = $('.biz-page-title').text();
            const $category = $('.category-str-list').text().replace('\n','');
            const $phone = $('.biz-phone').text().replace('\n','');
            const $website = $('.biz-website').find('a').text().replace('\n','');
            const $websiteLink = $('.biz- website').find('a').attr('href');
            const $address = $('address').text().replace('\n','');

            console.log(`Getting ${$name} details`);

            // Record the Company Info into a JSON object
            data = 
            {
                name: $name,
                category: $category,
                phone: $phone,
                website:$website,
                address: $address,
            }

            details.push(data);

            // Export the results from details array to a JSON file
            if(listings.length - 1 === i) {
                exportResults(details);
                console.log(`There is ${details.length} records`);
                return false;
            }
        } catch (error) 
        {
            exportResults(details);
            return false;
        }
    } }

花了5分钟多一点的时间才能获得150个结果。仅加载URL需要花费很长时间。我已经看到其他刮刀起泡很快。

0 个答案:

没有答案