并尝试从表中下载所有zip文件。但是,我无法从“汤”中找到表。它什么也不返回。
req = Request(
'https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc',
headers={'User-Agent': 'Mozilla/5.0'})
page = urlopen(req).read()
soup = BeautifulSoup(page, "html.parser")
tables = soup.find('div', class_='table table-bordered docnav-metadata dataTable no-footer')
答案 0 :(得分:1)
import requests_html
link = "https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc"
with requests_html.HTMLSession() as session:
r = session.get(link)
r.html.render(sleep=5,timeout=8)
for items in r.html.find("table.dataTable tr.desktop-row"):
data = [item.text for item in items.find("td")]
print(data)
答案 1 :(得分:0)
如前所述,您需要像硒这样的东西来加载页面,因为它是动态的。您还需要让它等待加载以获取表。
注意:我使用time.sleep()进行等待,但是我读到这不是最好的解决方案。建议使用WebDriverWait
,但我仍在理解其工作方式,因此,一旦我玩转,将对其进行更新。同时,这应该可以帮助您入门。
import bs4
from selenium import webdriver
import time
driver = webdriver.Chrome()
driver.get('https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc')
time.sleep(5)
html = driver.page_source
soup = bs4.BeautifulSoup(html,'html.parser')
tables = soup.findAll('table', {'class':'table table-bordered docnav-metadata dataTable no-footer'})
这对我来说WebDriverWait
有用:
import bs4
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get('https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc')
WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CSS_SELECTOR, "table.table-bordered.docnav-metadata.dataTable.no-footer")))
html = driver.page_source
soup = bs4.BeautifulSoup(html,'html.parser')
tables = soup.findAll('table', {'class':'table table-bordered docnav-metadata dataTable no-footer'})