`time.sleep`的可靠替代方案[因为没有它就不能抓取数据]

时间:2017-10-26 05:22:45

标签: python css selenium selenium-webdriver web-scraping

我的工作往往与time.sleep一起工作。但是,我想要一个比time.sleep(2)更快的方法,因为这很慢,并且在慢速上网或我的笔记本电脑上运行缓慢时无效。

Full code here.

这项工作适用于:

indexes = [index for index in range(len(options))]
shuffle(indexes)
for index in indexes:
    time.sleep(5)
    driver.get('https://www.bet365.com.au/#/AS/B1/')
    clickMe = wait(driver, 10).until(EC.element_to_be_clickable((By.XPATH,'(//div[div/div/text()="Main Lists"]//div[starts-with(@class, "sm-CouponLink_Label") and normalize-space()])[%s]' % str(index + 1))))
    clickMe.click()
    time.sleep(3)

将time.sleep更改为0表示作业刚刚成功完成[无刮擦或执行操作]。

不幸的是,

EC.presence_of_element_located((By.css_selector, "#TopPromotionBetNow"))
WebDriverWait(driver, timeout).until(element_present) 

给我一​​个错误。

clickMe = wait(driver, 10).until(EC.element_to_be_clickable((By.XPATH,'(//div[div/div/text()="Main Lists"]//div[starts-with(@class, "sm-CouponLink_Label") and normalize-space()

似乎没有效果,

关于我如何做到这一点的任何想法,这样工作将刮,导航,点击成功,以便页面完全加载?

2 个答案:

答案 0 :(得分:0)

您可以使用visibility_of_all_elements_located作为预期条件

langs2 = wait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, '//a[contains(@class, "tb_header-bar tb_")]')))

答案 1 :(得分:0)

没有必要抓取/解析页面。该网站有一个API来直接请求数据。您可以在加载页面时使用devtools(F12)检查这些请求。

列出英格兰超级联赛的所有市场:

import requests

URI_COMPETITIONS = "https://services.topbetta.com.au/api/v2/combined/sports/competitions?sport_name=football"
URI_EVENTS = "https://services.topbetta.com.au/api/v2/combined/events/markets/selections?competition_id=%s"

response = requests.get(URI_COMPETITIONS).json()

for sport in response['data'] :
  if sport['name'] == 'Football':

    for base_competition in sport['base_competitions'] :
      if base_competition['name'] == 'England Premier League':

        for info_competition in base_competition['competitions'] :
          response = requests.get(URI_EVENTS % info_competition['id']).json()

          for competition in response['data'] :
            print('%s' % competition['name'])

            for event in competition['events'] :
              print("  %s  %s" % (event['start_date'], event['name']))

              for market in event['markets']:                
                for selection in market['selections'] :
                  print("  %s  %s" % (selection['name'], selection['price']))

给出了:

England Premier League Round 26
  2018-02-06 07:00:00  Watford v Chelsea
    Watford  6
    Draw  3.8
    Chelsea  1.6
England Premier League Round 27
  2018-02-11 02:00:00  Everton v Crystal Palace
    Everton  2.4
    Draw  3.2
    Crystal Palace  3
  2018-02-11 23:00:00  Huddersfield Town v AFC Bournemouth
    Huddersfield Town  3
    Draw  3.2
    AFC Bournemouth  2.4
  2018-02-11 04:30:00  Manchester City v Leicester City
    Manchester City  1.2
    Draw  6.5
    Leicester City  13
...