我正在尝试从CNBC网站https://www.cnbc.com/nasdaq-100/上获取纳斯达克100的股票代码。我是美丽的汤的新手,但是如果有一种更简单的方法来抓取列表并保存数据,则我对任何解决方案都感兴趣。 下面的代码不会返回错误;但是,它也不会返回任何行情收录器。
import bs4 as bs
import pickle # serializes any python object so that we do not have to go back to the CNBC website to get the tickers each time we want
# to use the 100 ticker symbols
import requests
def save_nasdaq_tickers():
''' We start by getting the source code for CNBC. We will use the request module for this'''
resp = requests.get('https://www.cnbc.com/nasdaq-100')
soup = bs.BeautifulSoup(resp.text,"lxml")# we use txt when the response comes from request module I think because resp.txt is text of source code.
table = soup.find('table',{'class':"data quoteTable"}) # We want all table of the class we think matches the table data we want from cnbc
tickers = [] # empty tickers list
# Next week iterate through the table.
for row in table.findAll('tr')[1:]:# we want to find all table rows except the header row which should be row 0 so 1 onward [:1]
ticker = row.findAll('td')[0].txt #td is the columns of the table 0 is the first column which I perceived to be the tickers
# We specifiy .txt because it is a soup object
tickers.append(ticker)
# Save this list of tickers using pickle and with open???
with open("Nasdaq100Tickers","wb") as f: # name the file Nasdaq100... etc
pickle.dump(tickers,f) # dumping the tickers to file f
print(tickers)
return tickers
save_nasdaq_tickers()
答案 0 :(得分:2)
如果您想知道为什么"<html><body style='margin:0px;padding:0px;'><script type='text/javascript' " +
"src='http://www.youtube.com/iframe_api'></script><script type='text/javascript'>" +
"function onYouTubeIframeAPIReady(){ytplayer=new YT.Player('playerId'," +
"{events:{onReady:onPlayerReady}})}function onPlayerReady(a){a.target.playVideo();}"+
"</script>Youtube video .. <br><iframe id='playerId' type='text/html' width='100%' height='100%' " +
"https://www.youtube.com/embed/live_stream?channel=UCYn0pQcA8IMxk4cDFzlBF2w&autoplay=1' frameborder='0' allowfullscreen></body></html>"
webview.loadDataWithBaseURL(null, frameVideo, "text/html", "utf-8", null);
中没有任何内容,您的代码中只有一个小错误。 tickers
至ticker = row.findAll('td')[0].txt
。但是,当您希望在动态页面中获取全部内容时,则需要ticker = row.findAll('td')[0].text
。
selenium
答案 1 :(得分:1)
您可以模仿发出的XHR请求并解析出包含您要获取的数据的JSON
import requests
import pandas as pd
import json
from pandas.io.json import json_normalize
from bs4 import BeautifulSoup
url = 'https://quote.cnbc.com/quote-html-webservice/quote.htm?partnerId=2&requestMethod=quick&exthrs=1&noform=1&fund=1&output=jsonp&symbols=AAL|AAPL|ADBE|ADI|ADP|ADSK|ALGN|ALXN|AMAT|AMGN|AMZN|ATVI|ASML|AVGO|BIDU|BIIB|BMRN|CDNS|CELG|CERN|CHKP|CHTR|CTRP|CTAS|CSCO|CTXS|CMCSA|COST|CSX|CTSH&callback=quoteHandler1'
res = requests.get(url)
soup = BeautifulSoup(res.content, "lxml")
s = soup.select('html')[0].text.strip('quoteHandler1(').strip(')')
data= json.loads(s)
data = json_normalize(data)
df = pd.DataFrame(data)
print(df[['symbol','last']])
按以下方式返回JSON(示例已扩展):