我无法使用硒和漂亮的汤来抓取表格数据

时间:2019-09-28 22:18:48

标签: python selenium-webdriver web-scraping beautifulsoup

我已经尽力了,但是我似乎无法从表中抓取数据。我已经在stackoverflow上搜索了答案,但是似乎没有任何效果。本质上,表是空的,否则我根本找不到表中的元素。我正在使用Yahoo日常幻想网页中的表格。

注意:当前使用的网址可能每周都会更改,因此将来可能不是有效的地址。

当前代码:

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC 
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait as wait

driver = webdriver.Chrome()
driver.get("https://sports.yahoo.com/dailyfantasy/contest/5416455/setlineup")

response = wait(driver, 10).until(EC.presence_of_element_located((By.TAG_NAME,"data-tst-player-id")))
driver.quit

soup = BeautifulSoup(response, 'lxml')
with open('test.txt','w', encoding='utf-8') as f_out:
    f_out.write(soup.prettify())

1 个答案:

答案 0 :(得分:1)

该行中没有提供具有类名或ID的元素

response = wait(driver, 10).until(EC.presence_of_element_located((By.TAG_NAME,"data-tst-player-id")))

但是,有些标签具有'data-tst'属性,因此您可以使用它来确保页面已加载,并且在此行上

driver.quit

您什么都不做,必须调用函数driver.quit()。 工作代码:

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC 
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait as wait

driver = webdriver.Chrome()
driver.get("https://sports.yahoo.com/dailyfantasy/contest/5416455/setlineup")
wait(driver, 1).until(EC.presence_of_element_located((By.CSS_SELECTOR,"[data-tst]")))
response=driver.page_source
driver.quit()

soup = BeautifulSoup(response, 'lxml')
with open('test.txt','w', encoding='utf-8') as f_out:
    f_out.write(soup.prettify())