我正在处理一个样本/个人项目,该项目每天会从网站(如市场观察)的网站上检索一次特定股票的股票数据,然后将该数据与其他网站(如Google财经/雅虎财经/路透社)进行比较并测试准确性。
我一直坚持从MarketWatch检索数据。我正在寻找的“关键数据”(可以通过以下网址找到:https://www.marketwatch.com/investing/stock/aapl)似乎是动态生成的,因为当我以编程方式收集网页HTML时,与访问网站相比,它几乎不包含任何数据在浏览器中。
我曾尝试在浏览器中打开开发者控制台并查找AJAX调用,但未成功找到任何内容。我可以轻松地跳过从MarketWatch收集数据并继续前进,但是我将其视为提高我的33t编程技能的挑战。
有人能指出我正确的方向吗?我想找到一种方法来获得对数据请求的正确调用,或者仅在标头中发送特定值时才显示数据?那是我的主意。我正在使用Python和Beautiful Soup解析任何数据。
谢谢您的时间。
答案 0 :(得分:1)
有关OPS评论的新信息
我的错!尝试这样的事情。关键在于获取Cookie并确保它在您的get request标头中。您可以在那个市场观察页面上,从Web浏览器的开发人员网络选项卡中手动获取此Cookie。只需从此处找到该网页的get请求,进入请求标头,然后将Cookie复制/粘贴到您的代码中即可。这是一个超长的弦。在服务器返回完整网页之前,您将需要它。
我确信在发出包含您的数据的实际获取请求之前,有一种方法可以从marketwatch.com通过代码获取此Cookie。如果需要,我也可以尝试找出答案。
import requests
from bs4 import BeautifulSoup
url = 'https://www.marketwatch.com/investing/stock/aapl'
r = requests.get(url, headers={"User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Cookie": "refresh=on; letsGetMikey=enabled; "
"MicrosoftApplicationsTelemetryDeviceId=46fa0ca5-2561-7fe5-fd62-5b632398b7f4; "
"MicrosoftApplicationsTelemetryFirstLaunchTime=1534997155966; "
"pf_ffm=9bffce74bd493d996d1ae35769695510; "
"mw_loc=%7B%22country%22%3A%22US%22%2C%22region%22%3A%22TX%22%2C%22city%22%3A%22"
"PLANO%22%2C%22county%22%3A%5B%22COLLIN%22%5D%2C%22continent%22%3A%22NA%22%7D; "
"seenads=0; fullcss-quote=quote-85dcea2e5c.min.css; "
"utag_main=v_id:016564f545b40022de054359ac4403044003000900bd0$_sn:1$_ss:0$_st:"
"1534999294146$ses_id:1534997120440%3Bexp-session$_pn:2%3Bexp-session$"
"_prevpage:MW_Quote_Page%3Bexp-1535001094154$vapi_domain:marketwatch.com; "
"AMCV_CB68E4BA55144CAA0A4C98A5%40AdobeOrg=-1891778711%7CMCIDTS%7C17767%7CMCMID"
"%7C01084133064198290912411637324115388504%7CMCAAMLH-1535601937%7C9%7CMCAAMB"
"-1535601937%7CRKhpRz8krg2tLO6pguXWp5olkAcUniQYPHaMWWgdJ3xzPWQmdj0y%7CMCOPTOUT"
"-1535004337s%7CNONE%7CMCSYNCSOP%7C411-17774%7CMCAID%7CNONE%7CvVersion%7C2.4.0"
"; icons-loaded=true; AMCVS_CB68E4BA55144CAA0A4C98A5%40AdobeOrg=1; __gads=ID="
"c71aa1564ab44c97:T=1534997138:S=ALNI_Mbyv41MxhHTThfXFxMGtCFVyzsQaQ; "
"vidoraUserId=agqj4i6ugtd359uhgkfl761k4uu55g; __qca=P0-1349423161-15349971262"
"36; _ncg_sp_ses.f57d=*; _ncg_sp_id.f57d=b8b37d7b-2719-4a9c-baf9-3695f9deb20"
"8.1534997155.1.1534997520.1534997155.9a279294-1ff6-449e-a605-29c39215cfb4;"
" _ncg_id_=b8b37d7b-2719-4a9c-baf9-3695f9deb208; _ncg_g_id_=bd42bc08-4ebd-"
"44a1-8e7a-6e1c3eaac874; _parsely_visitor={%22id%22:%2211d7400c-c4f5-4322-"
"b109-0b01a21a74f2%22%2C%22session_count%22:1%2C%22last_session_"
"ts%22:1534997165510}; _parsely_session={%22sid%22:1%2C%22surl%22:"
"%22https://www.marketwatch.com/investing/stock/aapl%22%2C%22sref%22:"
"%22%22%2C%22sts%22:1534997165510%2C%22slts%22:0}; "
"s_ppvl=MW_Quote_Page%2C27%2C27%2C945%2C1076%2C945%2C1920%2C1080%2C1%2CP;"
" s_ppv=MW_Quote_Page%2C23%2C23%2C945%2C1076%2C945%2C1920%2C1080%2C1%2CP;"
" s_cc=true; cX_P=jl61ojfsquomne4l; usr_bkt=63L1D4y2F9; cX_S=jl61ojgcxgxachax;"
" cX_G=cx%3A12c0heqgxq7ug25eyhsfbg5iro%3A3qlznewunoji0; "
"recentqsmkii=Stock-US-AAPL; __utma=246750488.1666075546.1534997552."
"1534997552.1534997552.1; __utmb=246750488.1.9.1534997559734; "
"__utmc=246750488; __utmz=246750488.1534997552.1.1.utmcsr=(direct)"
"|utmccn=(direct)|utmcmd=(none)"})
print(r)
soup = BeautifulSoup(r.content, "html.parser")
key_data = soup.find_all('li', class_="kv__item")
# Key Data Field Names
print(soup.find_all('small', class_="kv__label"))
# Key Data Field Values
print(soup.find_all('span', class_="kv__primary"))
响应:
<Response [200]>
[<small class="kv__label">Open</small>, <small class="kv__label">Day Range</small>, <small class="kv__label">52 Week Range</small>, <small class="kv__label">Market Cap</small>, <small class="kv__label">Shares Outstanding</small>, <small class="kv__label">Public Float</small>, <small class="kv__label">Beta</small>, <small class="kv__label">Rev. per Employee</small>, <small class="kv__label">P/E Ratio</small>, <small class="kv__label">EPS</small>, <small class="kv__label">Yield</small>, <small class="kv__label">Dividend</small>, <small class="kv__label">Ex-Dividend Date</small>, <small class="kv__label">Short Interest</small>, <small class="kv__label">% of Float Shorted</small>, <small class="kv__label">Average Volume</small>]
[<span class="kv__value kv__primary ">$214.10</span>, <span class="kv__value kv__primary ">213.84 - 216.36</span>, <span class="kv__value kv__primary ">149.16 - 219.18</span>, <span class="kv__value kv__primary ">$1.04T</span>, <span class="kv__value kv__primary ">4.83B</span>, <span class="kv__value kv__primary ">4.82B</span>, <span class="kv__value kv__primary ">1.02</span>, <span class="kv__value kv__primary ">$2.08M</span>, <span class="kv__value kv__primary ">19.50</span>, <span class="kv__value kv__primary ">$11.03</span>, <span class="kv__value kv__primary ">1.36%</span>, <span class="kv__value kv__primary ">$0.73</span>, <span class="kv__value kv__primary ">Aug 10, 2018</span>, <span class="kv__value kv__primary ">37.27M</span>, <span class="kv__value kv__primary ">0.77%</span>, <span class="kv__value kv__primary ">24.1M</span>]
END NEW INFORMATION
如果您想从该市场观察页面的图表中获取每日股价数据,则类似的方法将起作用。他们确实有一个API路由。 您可能需要更新EntitlementToken才能起作用:
import requests
import json
# May need to update the EntitlementToken. To do so go to https://www.marketwatch.com/investing/stock/aapl,
# watch network connections, find the api call and parse out the token
# If token does not match. api call will return a 400
req_url = 'https://api-secure.wsj.net/api/michelangelo/timeseries/history?json={"Step":"PT1M","TimeFrame":"D1",' \
'"EntitlementToken":"cecc4267a0194af89ca343805a3e57af","IncludeMockTick":true,"FilterNullSlots":false,' \
'"FilterClosedPoints":true,"IncludeClosedSlots":false,"IncludeOfficialClose":true,"InjectOpen":false,' \
'"ShowPreMarket":false,"ShowAfterHours":false,"UseExtendedTimeFrame":false,"WantPriorClose":true,' \
'"IncludeCurrentQuotes":false,"ResetTodaysAfterHoursPercentChange":false,' \
'"Series":[{"Key":"STOCK/US/XNAS/AAPL","Dialect":"Charting","Kind":"Ticker","SeriesId":"s1",' \
'"DataTypes":["Last"],"Indicators":[{"Parameters":[{"Name":"ShowOpen"},{"Name":"ShowHigh"},' \
'{"Name":"ShowLow"},{"Name":"ShowPriorClose","Value":true},{"Name":"Show52WeekHigh"},' \
'{"Name":"Show52WeekLow"}],"Kind":"OpenHighLowLines","SeriesId":"i2"}]}]}&ckey=cecc4267a0'
r = requests.get(req_url, headers={"User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0",
"Content-Type": "application/json, text/javascript, */*; q=0.01",
"Dylan2010.EntitlementToken": "cecc4267a0194af89ca343805a3e57af"})
# Full Return
print(r)
# Stock UNIX Dates
print(json.loads(r.content)['TimeInfo']['Ticks'])
# Stock Prices
print(json.loads(r.content)['Series'][0]['DataPoints'])
将打印出以下数据(这只是收益的前5条记录的示例:
<Response [200]>
# Unix Datetime Stamps
[1534944600000, 1534944660000, 1534944720000, 1534944780000, 1534944840000]
# AAPL Prices
[[214.9001], [214.81], [214.84], [215.31], [215.2]]
如果您需要定期访问免费的财务数据,我强烈建议您使用YahooFinancials
https://github.com/JECSand/yahoofinancials
安装:
$ pip install yahoofinancials
用法示例:
from yahoofinancials import YahooFinancials
tech_stocks = ['AAPL', 'MSFT', 'INTC']
yahoo_financials_tech = YahooFinancials(tech_stocks)
print(yahoo_financials_tech.get_historical_price_data("2018-08-01", "2018-08-10", "weekly"))
结果:
{
"AAPL": {
"currency": "USD",
"eventsData": {
"dividends": {
"2018-08-06": {
"amount": 0.73,
"date": 1533907800,
"formatted_date": "2018-08-10"
}
}
},
"firstTradeDate": {
"date": 345459600,
"formatted_date": "1980-12-12"
},
"instrumentType": "EQUITY",
"prices": [
{
"adjclose": 207.2631072998047,
"close": 207.99000549316406,
"date": 1532923200,
"formatted_date": "2018-07-30",
"high": 208.74000549316406,
"low": 197.30999755859375,
"open": 199.1300048828125,
"volume": 163787100
},
{
"adjclose": 206.80471801757812,
"close": 207.52999877929688,
"date": 1533528000,
"formatted_date": "2018-08-06",
"high": 209.77999877929688,
"low": 204.52000427246094,
"open": 208.0,
"volume": 121618700
}
],
"timeZone": {
"gmtOffset": -14400
}
},
"INTC": {
"currency": "USD",
"eventsData": {
"dividends": {
"2018-08-06": {
"amount": 0.3,
"date": 1533562200,
"formatted_date": "2018-08-06"
}
}
},
"firstTradeDate": {
"date": 322131600,
"formatted_date": "1980-03-17"
},
"instrumentType": "EQUITY",
"prices": [
{
"adjclose": 49.33000183105469,
"close": 49.630001068115234,
"date": 1532923200,
"formatted_date": "2018-07-30",
"high": 49.779998779296875,
"low": 48.0,
"open": 48.060001373291016,
"volume": 76521400
},
{
"adjclose": 48.55471420288086,
"close": 48.849998474121094,
"date": 1533528000,
"formatted_date": "2018-08-06",
"high": 50.599998474121094,
"low": 48.29999923706055,
"open": 48.77000045776367,
"volume": 129482900
}
],
"timeZone": {
"gmtOffset": -14400
}
},
"MSFT": {
"currency": "USD",
"eventsData": {},
"firstTradeDate": {
"date": 511088400,
"formatted_date": "1986-03-13"
},
"instrumentType": "EQUITY",
"prices": [
{
"adjclose": 107.62582397460938,
"close": 108.04000091552734,
"date": 1532923200,
"formatted_date": "2018-07-30",
"high": 108.08999633789062,
"low": 104.83999633789062,
"open": 106.02999877929688,
"volume": 68392600
},
{
"adjclose": 108.58214569091797,
"close": 109.0,
"date": 1533528000,
"formatted_date": "2018-08-06",
"high": 110.16000366210938,
"low": 107.55999755859375,
"open": 108.12000274658203,
"volume": 83677700
}
],
"timeZone": {
"gmtOffset": -14400
}
}
}