如何在不单击按钮的情况下加载数据?

时间:2019-08-28 23:23:17

标签: python ajax selenium web-scraping data-extraction

我想从https://e27.co/startups/刮掉所有创业公司的名称。 您可以看到默认情况下有20个启动名称,要加载更多,请单击“加载更多”按钮。此按钮将加载10个启动名称

我创建了python脚本,单击“加载更多”按钮,直到将加载所有(29000)个启动程序。这需要大量时间和RAM。 我如何在没有单击的情况下加载这些数据?

我听到了AJAX请求调用的内容,但我不知道如何实现。

按钮的HTML代码:

<button class="button btn-load-more" data-start="0">Load More</button>

数据启动参数一键更改+10

按钮(JS)的事件代码

        startupList.elem.find('.btn-load-more').off('.click').click(function(){
            startupList.elem.find('.btn-load-more').addClass('hide');
            Global.loading();
            startupList.loadMoreIsClicked = true;
            var start = $(this).attr('data-start')*1;
            start += startupList.count;
            $(this).attr('data-start', start);
            startupList.searchAndFilterResult(start, startupList.getFormData("#startup_search"), false);

我的python代码:

    def __init__(self):
        opp = Options()
        opp.add_argument('--blink-settings=imagesEnabled=false')
        opp.add_argument('--headless')
        self.driver = webdriver.Chrome('./chromedriver', chrome_options=opp)

    def parse(self, e27_url = "https://e27.co/startups/"):
        self.driver.get(e27_url)
        time.sleep(3)
        run_check, prev_value_list = True, [0, 0]
        button = self.driver.find_element_by_xpath("//button[@class='button btn-load-more']")

        while run_check:
            quantity_of_loaded_starttups =  len(self.driver.find_elements_by_xpath(
                        "//div[@class='startup-block startup-list-item']"))
            print('Loading, {} startups loaded'.format(quantity_of_loaded_starttups))
            prev_value_list.append(quantity_of_loaded_starttups)
            timer = 0
            while (not button.is_displayed()):
                time.sleep(0.1)
                timer +=0.1
                print(timer)
                if timer == 60:
                    run_check = False
                    break


            button.click()

            if prev_value_list[-2] == prev_value_list[-1] and  prev_value_list[-3]  == prev_value_list[-1]:
                run_check = False


        company_names, e_urls,  = [], []
        for item in self.driver.find_elements_by_xpath("//div[@class='startup-block startup-list-item']"):
            name = item.find_element_by_css_selector('.company-name').text
            e27url = item.find_element_by_css_selector(".startuplink").get_attribute("href")

            yield {"Startup":name,"Url":e27url}

您可以访问e27.co/startups并自行检查。

谢谢, qwew

1 个答案:

答案 0 :(得分:0)

您可以通过按 Load More 按钮查找请求的接收位置,从而直接访问其API。在这种情况下,请求是从以下URL接收的。

https://e27.co/api/startups/?tab_name=recentlyupdated&start=10&length=10

因此,通过对lengthstart进行一些修改,您可以获得更多的URL。我已经编写了一个简单的脚本来获取初创企业的名称。

import requests

start_number = 0
r = requests.get('https://e27.co/api/startups/?tab_name=recentlyupdated&start={}&length=100'.format(start_number))
r = r.json()
for i in r['data']['list']:
    print(i['name'])
#outputs
RESYNC Technologies
Swizzle
Sports365
ShopClues
Symantec
SpoonJoy
SEOPRO India
Solarium
SHOPLINE
Structo
Coc Coc
CarDekho
Chillr
Culture Machine
CoAssets
CoinMKT
CimplyFive
Call Levels
CereBrahm Innovations
CouponzGuru
Aisle
adMingle
AppsFlyer
AppVirality
Ambient Digital
Airtel
Apptopia
Latize
Lefora
LINC 360
LogisticsIndonesia
LogicGateOne Corporation
Livspace
LivePhuket
LINE Ventures
National Tiles-Sydney
National Tiles-Brisbane
National Research Foundation
National Tiles
National Tiles-Adelaide
National University of Singapore School of Computing
National Tiles-Wagga Wagga
National Tiles-Springwood
National Tiles-Burleigh Heads
Nationkart
Natasha
Naturally Yours
Native5
Nativfy
NaturalMantra
Native Tongue
NewsHunt
Nimble Wireless
Nanarokom.com
NoBroker
News Corp
Naxos International
NecesCity
NextGen
Notey
Naspers Group
NAM TRIP TRAVEL
Navigat Group
Nanosatisfi
Naaptol
Single Thailand
sinhasoft
Sinergy
Singsys Pte. Ltd.
Simplilearn
SIFS India
Simprosys InfoMedia
SimiCommerce
SingPost
Singapore Press Holdings
SimplerCloud
SingSaver
Sinoze
Singapore infocomm Technology Federation
Native Tech
Novelship
AthenaDesk
ZERO BrandCard™
Open24.vn
iMyanmarHouse
Shufti Pro
MobME Wireless
Moolya Testing
Mofang Gongyu
Moff Inc.
Moonfrog Labs
myNoticePeriod
MaGIC
Momoe
Manthan
Metaps
Motorola Solutions
MatchMove
Mondano
MOL- Money Online