我正在尝试使用BeautifulSoup从其网站上收集有关NASDAQ-100的CNBC数据,但是当我尝试将其数据更改为DataFrame时,它显示的数据框为空,列:[],索引:[]
下面是我的代码:
# Importing Libraries
from bs4 import BeautifulSoup
import requests
import csv
import pandas as pd
# Create parse tree for parsed pages
page=requests.get("https://www.cnbc.com/nasdaq-100")
#content=page.content
# Scrape data from specific <div> column
# Title for the data table -> NASDAQ-100
soup=BeautifulSoup(page.content,"html.parser")
l = []
title=soup.find("div",{"class":"PageHeader-main"}).find("h1").text
table=soup.find_all("table",{"class":"BasicTable-basicTable"})
for items in table:
for i in range(len(items.find_all("tr"))-1):
# Gather data
d = {}
d["stock_symbol"] = items.find_all("td", {"class":"BasicTable-symbol"})[i].find("a").text
d["stock_name"] = items.find_all("td", {"class":"BasicTable-name"})[i].text
d["price"] = items.find_all("td", {"class":"BasicTable-unchanged BasicTable-numData"})[i].text
d["price_change"] = items.find_all("td", {"class":"BasicTable-quoteDecline"})[i].text
d["percentage_change"] = items.find_all("td", {"class":"BasicTable-quoteDecline"})[i].text
# Print ("")
l.append(d)
df = pd.DataFrame(l)
print(df)
答案 0 :(得分:2)
您正在处理的网站是在页面加载后使用JavaScript
呈现其数据,因此,我们现在有2个选项。
XHR
请求到API
数据所在的位置
检索并获取。selenium
方法。下面列出了两种解决方案:
import requests
import json
r = requests.get("https://quote.cnbc.com/quote-html-webservice/quote.htm?noform=1&partnerId=2&fund=1&exthrs=0&output=json&symbolType=issue&symbols=153171|172296|74548134|178129|90065764|185811|181702|3145559|8279577|8392868|196573|197784|177124|144094|205778|207106|208206|208526|217706|211573|217809|218647|25427545|223056|225584|226052|226354|90065765|227524|237331|240690|244210|253970|263397|248911|264170|256951|273612|24812378|274516|7186257|9079610|4038959|282500|21167615|282560|283581|284350|50675033|288727|288976|289807&requestMethod=extended").json()
data = json.dumps(r, indent=4)
print(data)
print(r.keys())
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
import pandas as pd
from time import sleep
options = Options()
options.add_argument('--headless')
driver = webdriver.Firefox(options=options)
driver.get("https://www.cnbc.com/nasdaq-100")
sleep(2)
df = pd.read_html(driver.page_source)[0]
print(df)
df.to_csv("result.csv", index=False)
driver.quit()
输出:check-online
示例: