无法弄清楚为什么下面修改后的Python脚本无法使用新的收入日历格式。它似乎不匹配可能与旧格式(动态javascript?)显着不同的href。
import datetime
import requests
import bs4
import csv
def get_earning_data(date,date2):
url = "http://finance.yahoo.com/calendar/earnings?&day={}".format(date)
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0"}
html = requests.get(url, headers=headers).text
soup = bs4.BeautifulSoup(html, "html.parser")
quotes = []
for tr in soup.find_all("tr"):
if len(tr.contents) > 3:
if len(tr.contents[1].contents) > 0:
if tr.contents[1].contents[0].name == "a":
if tr.contents[1].contents[0]["href"].startswith("/quote/"):
if "." not in tr.contents[1].contents[0].text:
quotes.append(tr.contents[1].contents[0].text)
quotes.append(date2)
return quotes
outfile = "EarningsCalendar.csv"
open(outfile, 'wb').close
index = 0
while index < 7:
date = (datetime.date.today() + datetime.timedelta(index)).strftime("%Y-%m-%d")
date2 = (datetime.date.today() + datetime.timedelta(index)).strftime("%d/%m/%Y")
mylist = get_earning_data(date,date2)
print (mylist)
with open(outfile, 'ab') as csvfile:
writer = csv.writer(csvfile, delimiter=',',quoting=csv.QUOTE_NONE)
for i in range(0, len(mylist), 2):
writer.writerow(mylist[i:i+2])
index += 1
以下是04-05-2017的示例页面源行:
<tr class="data-rowKMX9 Bgc($extraLightBlue):h H(36px) Bgc($altRowColor)" data-reactid="490"><td class="data-col0 Ta(start) Pend(15px) Pstart(6px) W(10%)" data-reactid="491"><a href="/quote/KMX?p=KMX" title="Carmax Inc" data-symbol="KMX" class="Fw(b)" data-reactid="492">KMX</a></td><td class="data-col1 Ta(start) Pend(10px) W(20%)" data-reactid="493">Carmax Inc</td><td class="data-col2 Ta(end) Pstart(15px) W(10%)" data-reactid="494">0.79</td><td class="data-col3 Ta(end) Pstart(15px) W(10%)" data-reactid="495">-</td><td class="data-col4 Ta(end) Pstart(15px) W(10%)" data-reactid="496"><span class="" data-reactid="497">-</span></td><td class="data-col5 Ta(end) Pend(6px) Pstart(15px) W(13%)" data-reactid="498"><span data-reactid="499">Before Market Open</span></td></tr>
以下是显示旧格式的示例页面:http://web.archive.org/web/20170301070135/https://biz.yahoo.com/research/earncal/today.html
我能看到的新旧之间的唯一区别是.startswith。使用“http://finance.yahoo.com/quote/”也不起作用。