我已经尝试了几天(使用this site和MathWorks上的其他答案)来绕过Yahoo Finance在链接末尾添加的crumb
来下载CSV文件,例如对于在Chrome浏览器中使用纳斯达克100数据的CSV,您将获得以下链接:https://query1.finance.yahoo.com/v7/finance/download/%5ENDX?period1=496969200&period2=1519513200&interval=1d&events=history&crumb=dnhBC8SRS9G(点击this Yahoo Finance页面上的“下载数据”按钮。)
此crumb=dnhBC8SRS9G
显然会因Cookie和用户代理而发生变化,因此我尝试相应地配置MATLAB以将自己伪装成Chrome浏览器(复制Chrome中的Cookie /用户代理):
useragent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.79 Safari/537.36';
cookie ='PRF=t%3D%255ENDX; expires=Thu, 11-Jun-2020 09:06:31 GMT; path=/; domain=.finance.yahoo.com';
opts = weboptions('UserAgent',useragent,'KeyName','WWW_Authenticate','KeyValue','dnhBC8SRS9G','KeyName','Cookie','KeyValue',cookie)
url = 'https://query1.finance.yahoo.com/v7/finance/download/^NDX?period1=496969200&period2=1519513200&interval=1d&events=history&crumb=dnhBC8SRS9G' ;
response = webread(url,opts)
但无论我做什么(使用webread
或额外函数urlread2
),我都会收到“未经授权”的回复。上面的MATLAB代码给出了响应:
Error using readContentFromWebService (line 45)
The server returned the status 401 with message "Unauthorized" in response to the request to URL
https://query1.finance.yahoo.com/v7/finance/download/%5ENDX?period1=496969200&period2=1519513200&interval=1d&events=history&crumb=dnhBC8SRS9G.
Error in webread (line 122)
[varargout{1:nargout}] = readContentFromWebService(connection, options);
Error in TEST2 (line 22)
response = webread(url,opts)
我们非常感谢任何帮助,我只是想让基础工作正常工作,即使手动必须在第一次请求之前将crumb
从Chrome浏览器复制到MATLAB中。 (我看到他们用Python,C#等解决了它,我尽可能地遵循这些解决方案,所以它在MATLAB中也应该可行,对吧?)
编辑:如果有任何帮助,当我在代码末尾运行urlread2
而不是webread
时,即:
[output,extras] = urlread2(url,'GET');
extras.firstHeaders
我从MATLAB得到以下输出:
ans =
struct with fields:
Response: 'HTTP/1.1 401 Unauthorized'
X_Content_Type_Options: 'nosniff'
WWW_Authenticate: 'crumb'
Content_Type: 'application/json;charset=utf-8'
Content_Length: '136'
Date: 'Tue, 12 Jun 2018 13:07:38 GMT'
Age: '0'
Via: 'http/1.1 media-router-omega4.prod.media.ir2.yahoo.com (ApacheTrafficServer [cMsSf ]), http/1.1 media-ncache-api17.prod.media.ir2.yahoo.com (ApacheTrafficServer [cMsSf ]), http/1.1 media-ncache-api15.prod.media.ir2.yahoo.com (ApacheTrafficServer [cMsSf ]), http/1.1 media-router-api12.prod.media.ir2.yahoo.com (ApacheTrafficServer [cMsSf ]), https/1.1 e3.ycpi.seb.yahoo.com (ApacheTrafficServer [cMsSf ])'
Server: 'ATS'
Expires: '-1'
Cache_Control: 'max-age=0, private'
Strict_Transport_Security: 'max-age=15552000'
Connection: 'keep-alive'
Expect_CT: 'max-age=31536000, report-uri="http://csp.yahoo.com/beacon/csp?src=yahoocom-expect-ct-report-only"'
Public_Key_Pins_Report_Only: 'max-age=2592000; pin-sha256="2fRAUXyxl4A1/XHrKNBmc8bTkzA7y4FB/GLJuNAzCqY="; pin-sha256="2oALgLKofTmeZvoZ1y/fSZg7R9jPMix8eVA6DH4o/q8="; pin-sha256="Gtk3r1evlBrs0hG3fm3VoM19daHexDWP//OCmeeMr5M="; pin-sha256="I/Lt/z7ekCWanjD0Cvj5EqXls2lOaThEA0H2Bg4BT/o="; pin-sha256="JbQbUG5JMJUoI6brnx0x3vZF6jilxsapbXGVfjhN8Fg="; pin-sha256="SVqWumuteCQHvVIaALrOZXuzVVVeS7f4FGxxu6V+es4="; pin-sha256="UZJDjsNp1+4M5x9cbbdflB779y5YRBcV6Z6rBMLIrO4="; pin-sha256="Wd8xe/qfTwq3ylFNd3IpaqLHZbh2ZNCLluVzmeNkcpw="; pin-sha256="WoiWRyIOVNa9ihaBciRSC7XHjliYS9VwUGOIud4PB18="; pin-sha256="cAajgxHlj7GTSEIzIYIQxmEloOSoJq7VOaxWHfv72QM="; pin-sha256="dolnbtzEBnELx/9lOEQ22e6OZO/QNb6VSSX2XHA3E7A="; pin-sha256="i7WTqTvh0OioIruIfFR4kMPnBqrS2rdiVPl/s2uC/CY="; pin-sha256="iduNzFNKpwYZ3se/XV+hXcbUonlLw09QPa6AYUwpu4M="; pin-sha256="lnsM2T/O9/J84sJFdnrpsFp3awZJ+ZZbYpCWhGloaHI="; pin-sha256="r/mIkG3eEpVdm+u/ko/cwxzOMo1bk4TyHIlByibiA5E="; pin-sha256="uUwZgwDOxcBXrQcntwu+kYFpkiVkOaezL0WYEZ3anJc="; includeSubdomains; report-uri="http://csp.yahoo.com/beacon/csp?src=yahoocom-hpkp-report-only"'
我的weboptions
输出是:
opts =
weboptions with properties:
CharacterEncoding: 'auto'
UserAgent: 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.79 Safari/537.36'
Timeout: 5
Username: ''
Password: ''
KeyName: ''
KeyValue: ''
ContentType: 'auto'
ContentReader: []
MediaType: 'application/x-www-form-urlencoded'
RequestMethod: 'auto'
ArrayFormat: 'csv'
HeaderFields: {'Cookie' 'PRF=t%3D%255ENDX; expires=Thu, 11-Jun-2020 09:06:31 GMT; path=/; domain=.finance.yahoo.com'}
CertificateFilename: '/opt/matlab/r2017a/sys/certificates/ca/rootcerts.pem'
答案 0 :(得分:2)
好吧,有些人用Curl来解决这个问题,看来你在那个指定的URL上无法做到这一点。值得注意的是 crumb 和 cookie 经常更改,因此每次运行脚本以获取其值时,我都必须解析两个GET请求的响应。
我会引导你完成我的尝试。
代码:
%Get cookie.
command = 'curl -s --cookie-jar cookie.txt https://finance.yahoo.com/quote/GOOG?p=GOOG';
%Execute request.
system(command);
%Read file.
cookie_file = fileread('cookie.txt');
%regexp the cookie.
cookie = regexp(cookie_file,'B\s*(.*)','tokens');
cookie = cell2mat(cookie{1});
%Print cookie to file (for curl purposes only).
file = fopen('mycookie.txt','w');
fprintf(file,'%s',cookie);
%Get request.
command = 'curl https://finance.yahoo.com/quote/GOOG?p=GOOG > goog.txt';
%Execute request.
system(command);
%Read file.
crumb_file = fileread('goog.txt');
%regexp the crumb.
crumb = regexp(crumb_file,'(?<="CrumbStore":{"crumb":")(.*)(?="},"UserStore":)','tokens');
crumb = crumb{:};
%Form the URL.
url = 'https://query1.finance.yahoo.com/v7/finance/download/AAPL?period1=1492524105&period2=1495116105&interval=1d&events=history&crumb=';
url = strcat(url,crumb);
%Form the curl command.
command = strcat('curl',{' '},'-v -L -b',{' '},'mycookie.txt',{' '},'-H',{' '},'"User-Agent:',{' '},'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.79 Safari/537.36','"',{' '},'"',url,'"');
command = command{1};
system(command);
最终卷曲请求:
curl -v -L -b mycookie.txt -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.79 Safari/537.36" "https://query1.finance.yahoo.com/v7/finance/download/^NDX?period1=496969200&period2=1519513200&interval=1d&events=history&crumb=dSpwQstrQDp"
在最终的卷曲请求中,我使用以下标志:
-v: verbosity
-L: follow redirects
-b: use cookie file
-H: user agent header field (tried spoofing it with my browser)
对于每次尝试,响应如下:
{
"finance": {
"error": {
"code": "Unauthorized",
"description": "Invalid cookie"
}
}
}
我研究了服务器响应,并且客户端成功发送了每个标头值,但它总是会导致相同的错误。现在我怀疑你根本不能解释here。因此,如用户所指出的,您可能需要从其他位置执行Web抓取。也许如果你找到一个有用的URL,你可以打开一个新的问题,我很乐意帮忙。
答案 1 :(得分:2)
Yahoo进行了许多检查,以确保请求来自Web浏览器。 请检查https://www.mathworks.com/matlabcentral/fileexchange/68361-yahoo-finance-data-downloader的此功能,使Yahoo Finance认为该请求来自浏览器。
以下是有关如何使用此功能下载和分析市场数据的几个示例 https://github.com/Lenskiy/market-data-functions
答案 2 :(得分:1)
以下是下载AAPL股票上个月数据的脚本,并创建一个名为 AAPL_14-05-2018_14-06-2018 的.csv文件,其中包含日期,开放,高,找到here的低,关闭,调整和体积信息。
<runtime>
<assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
<dependentAssembly>
<assemblyIdentity name="QuickGraph" publicKeyToken="f3fb40175eec2af3" culture="neutral" />
<bindingRedirect oldVersion="0.0.0.0-3.6.61114.0" newVersion="3.6.61114.0" />
</dependentAssembly>
</assemblyBinding>
</runtime>
.csv文件输出(仅显示最近几天):
%Choose any ticker.
ticker = 'AAPL'; %'FB','AMZN'...
%Base url.
url = 'https://query1.finance.yahoo.com/v8/finance/chart/GOOG?symbol=';
%weboption constructor.
opts = weboptions();
%Start retrieving data from today.
today = datetime('now');
today.TimeZone = 'America/New_York';
%Convert dates to unix timestamp.
todayp = posixtime(today);
%Last week.
weekp = posixtime(datetime(addtodate(datenum(today),-7,'day'),'ConvertFrom','datenum'));
%Last month.
monthp = posixtime(datetime(addtodate(datenum(today),-1,'month'),'ConvertFrom','datenum'));
%Last year.
yearp = posixtime(datetime(addtodate(datenum(today),-1,'year'),'ConvertFrom','datenum'));
%Add ticker.
url = strcat(url,ticker);
%Construct url, add time intervals. The following url is for last month worth of data.
url = strcat(url,'&period1=',num2str(monthp,'%.10g'),'&period2=',num2str(todayp,'%.10g'),'&interval=','1d');
%Execute HTTP request.
data = webread(url,opts);
%Get data.
dates = flipud(datetime(data.chart.result.timestamp,'ConvertFrom','posixtime'));
high = flipud(data.chart.result.indicators.quote.high);
low = flipud(data.chart.result.indicators.quote.low);
vol = flipud(data.chart.result.indicators.quote.volume);
open = flipud(data.chart.result.indicators.quote.open);
close = flipud(data.chart.result.indicators.quote.close);
adjclose = flipud(data.chart.result.indicators.adjclose.adjclose);
%Create table.
t = table(dates,open,high,low,close,adjclose,vol);
%Format filename: ticker, start date, end date.
namefile = strcat(ticker,'_',char(datetime(monthp,'Format','dd-MM-yyyy','ConvertFrom','posixtime')),...
'_',char(datetime(todayp,'Format','dd-MM-yyyy','ConvertFrom','posixtime')),'.csv');
%Write table to file.
writetable(t,namefile);
上面我得到了上个月的数据。在评论中,我展示了如何适应上周和去年。我可以轻松地调整代码并将其转换为一个函数,供您在任何库存和时间间隔使用。您只需要告诉我您感兴趣的时间间隔。