等到不等待元素加载,因此数据输出不正确

时间:2017-12-26 10:40:29

标签: python selenium xpath web-scraping

为什么当我添加time.sleep(2)时,我得到了我想要的输出但是如果我添加等待直到特定的xpath它会得到更少的结果?

使用time.sleep(2)输出(也所需):

Adelaide Utd
Tottenham
Dundee Fc
 ...

数:145个名字

删除time.sleep

Adelaide Utd
Tottenham
Dundee Fc
 ...

数:119名

我已添加:

clickMe = wait(driver,    13).until(EC.element_to_be_clickable((By.CSS_SELECTOR,    ("#page-container > div:nth-child(4) > div >    div.ubet-sports-section-page > div > div:nth-child(2) > div > div >    div:nth-child(1) > div > div > div.page-title-new > h1"))))

由于此元素出现在所有pages上。

似乎要少得多。我怎样才能解决这个问题?

脚本:

import csv
import os

from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.ui import WebDriverWait as wait


driver = webdriver.Chrome()
driver.set_window_size(1024, 600)
driver.maximize_window()

driver.get('https://ubet.com/sports/soccer')



clickMe = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, ('//select[./option="Soccer"]/option'))))

options = driver.find_elements_by_xpath('//select[./option="Soccer"]/option')


indexes = [index for index in range(len(options))]
for index in indexes:


    try:
        try:
            zz = wait(driver, 10).until(
                EC.element_to_be_clickable((By.XPATH, '(//select/optgroup/option)[%s]' % str(index + 1))))
            zz.click()
        except StaleElementReferenceException:
            pass

        from selenium.webdriver.support.ui import WebDriverWait
        def find(driver):
            pass

        from selenium.common.exceptions import StaleElementReferenceException, NoSuchElementException
        import time
        clickMe = wait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, ("#page-container > div:nth-child(4) > div > div.ubet-sports-section-page > div > div:nth-child(2) > div > div > div:nth-child(1) > div > div > div.page-title-new > h1"))))

        langs0 = driver.find_elements_by_css_selector(
            "div > div > div > div > div > div > div > div > div.row.collapse > div > div > div:nth-child(2) > div > div > div > div > div > div.row.small-collapse.medium-collapse > div:nth-child(1) > div > div > div > div.lbl-offer > span")
        langs0_text = []

        for lang in langs0:
            try:
                langs0_text.append(lang.text)
            except StaleElementReferenceException:
                pass


        directory = 'C:\\A.csv' #####################################
        with open(directory, 'a', newline='', encoding="utf-8") as outfile:
            writer = csv.writer(outfile)
            for row in zip(langs0_text):
                writer.writerow(row)
    except StaleElementReferenceException:
        pass

如果您无法访问页面,则需要vpn。

更新...

也许该元素在其他元素之前加载。因此,如果我们将其更改为datascraped(并非所有页面都有要删除的数据)。

添加:

try:
    clickMe = wait(driver, 13).until(EC.element_to_be_clickable((By.CSS_SELECTOR, ("div > div > div > div > div > div > div > div > div.row.collapse > div > div > div:nth-child(2) > div > div > div > div > div > div.row.small-collapse.medium-collapse > div:nth-child(3) > div > div > div > div.lbl-offer > span"))))
except TimeoutException as ex:
    pass

同样的问题仍然存在

手动步骤:

  1. 打开页面https://ubet.com/sports/soccer/nexttoplay/

  2. 点击下一步,在选择竞争下进行游戏

  3. 点击下拉列表中的第一个元素(对我来说是英格兰总理)

  4. 等待页面加载通过waituntil'团队名称'存在。 '球队名称'定义如下。

  5. 抓取数据'团队名称'

  6. 抓取数据'团队名称'对于下拉列表中的所有元素。接下来就是 英格兰法杯对我来说。我们想要抓取所有数据'团队名称'直到它去 通过整个下拉列表。

  7. 团队名称=

    div > div > div > div > div > div > div > div > div.row.collapse > div > div > div:nth-child(2) > div > div > div > div > div > div.row.small-collapse.medium-collapse > div:nth-child(1) > div > div > div > div.lbl-offer > span 
    

0 个答案:

没有答案