如何处理有问题的网页,以便与this
类似的方式无法正确删除数据虽然我尝试在下面执行类似的操作而没有运气,因为页面的结构不是那么简单。我知道如何处理不平等的数据,因为网页会随机数据变得不均匀。
所需
Azam FC v Mwenge 1.8 https://www.bet365.com.au/#/AC/B1/C1/D13/E104/F16/S1/
Western Sydney Wanderers v Melbourne City 2.87 https://www.bet365.com.au/#/AC/B1/C1/D13/E101/F16/S1/
Sydney FC v Newcastle Jets 1.53 https://www.bet365.com.au/#/AC/B1/C1/D13/E101/F16/S1/
输出看起来像
Azam FC v Mwenge 1.8 https://www.bet365.com.au/#/AC/B1/C1/D13/E104/F16/S1/
Western Sydney Wanderers v Melbourne City 1.53 https://www.bet365.com.au/#/AC/B1/C1/D13/E101/F16/S1/
1.53不应该是西悉尼,而是悉尼FC
Script.py
import collections
import csv
import time
from selenium import webdriver
from selenium.common.exceptions import TimeoutException, NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.ui import WebDriverWait as wait
driver = webdriver.Chrome()
driver.set_window_size(1024, 600)
driver.maximize_window()
driver.get('https://www.bet365.com.au/#/AS/B1/')
driver.get('https://www.bet365.com.au/#/AS/B1/')
def page_counter():
for x in range(1000):
yield x
count = page_counter()
clickMe = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, ('//div[div/div/text()="Main Lists"]//div[starts-with(@class, "sm-CouponLink_Label") and normalize-space()]'))))
coupon_lables = [x.text for x in driver.find_elements_by_xpath('//div[div/div/text()="Main Lists"]//div[starts-with(@class, "sm-CouponLink_Label") and normalize-space()]')]
links = dict((next(count) + 1, e) for e in coupon_lables)
desc_links = collections.OrderedDict(sorted(links.items(), reverse=True))
for key, label in desc_links.items():
driver.get('https://www.bet365.com.au/#/AS/B1/')
clickMe = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, ('//div[div/div/text()="Main Lists"]//div[starts-with(@class, "sm-CouponLink_Label") and normalize-space()]'))))
driver.find_element_by_xpath(f'//div[contains(text(), "' + label + '")]').click()
groups = '/html/body/div[1]/div/div[2]/div[1]/div/div[2]/div[2]/div/div/div[2]/div'
xp_match_link = "//div//div[contains(@class, 'sl-CouponParticipantWithBookCloses_Name ')]"
xp_bp1 = "//div[contains(@class, 'gl-Market_HasLabels')]/following-sibling::div[contains(@class, 'gl-Market_PWidth-12-3333')][1]//div[contains(@class, 'gl-ParticipantOddsOnly')]"
try:
# wait for the data to populate the tables
wait(driver, 5).until(EC.element_to_be_clickable((By.XPATH, (xp_bp1))))
time.sleep(2)
data = []
for elem in driver.find_elements_by_xpath(groups):
try:
match_link = elem.find_element_by_xpath(xp_match_link) \
.get_attribute('href')
except:
match_link = None
try:
bp1 = elem.find_element_by_xpath(xp_bp1).text
except:
bp1 = None
data.append([bp1, match_link])
# data.append([match_link, bp1, ba1, bp3, ba3])
print(data)
url1 = driver.current_url
with open('C:\\daw.csv', 'a', newline='',
encoding="utf-8") as outfile:
writer = csv.writer(outfile)
for row in data:
writer.writerow(row)
except TimeoutException as ex:
pass
except NoSuchElementException as ex:
print(ex)
break
driver.close()
答案 0 :(得分:0)
如果更改以下xpath,它应该可以工作:
xp_match_link = "//div//div[contains(@class, 'sl-CouponParticipantWithBookCloses_NameContainer ')]"