刮刮雅虎/誓言新收入日历格式

时间:2017-04-05 19:42:29

标签: python html web-scraping beautifulsoup yahoo-finance

无法弄清楚为什么下面修改后的Python脚本无法使用新的收入日历格式。它似乎不匹配可能与旧格式(动态javascript?)显着不同的href。

import datetime
import requests
import bs4
import csv

def get_earning_data(date,date2):
    url = "http://finance.yahoo.com/calendar/earnings?&day={}".format(date)
    headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0"}
    html = requests.get(url, headers=headers).text
    soup = bs4.BeautifulSoup(html, "html.parser")
    quotes = []
    for tr in soup.find_all("tr"):
        if len(tr.contents) > 3:
            if len(tr.contents[1].contents) > 0:
                if tr.contents[1].contents[0].name == "a":
                    if tr.contents[1].contents[0]["href"].startswith("/quote/"):
                        if "." not in tr.contents[1].contents[0].text: 
                            quotes.append(tr.contents[1].contents[0].text)
                            quotes.append(date2)
    return quotes

outfile = "EarningsCalendar.csv"
open(outfile, 'wb').close
index = 0
while index < 7:
    date = (datetime.date.today() + datetime.timedelta(index)).strftime("%Y-%m-%d")
    date2 = (datetime.date.today() + datetime.timedelta(index)).strftime("%d/%m/%Y")
    mylist = get_earning_data(date,date2)
    print (mylist)
    with open(outfile, 'ab') as csvfile:
        writer = csv.writer(csvfile, delimiter=',',quoting=csv.QUOTE_NONE)
        for i in range(0, len(mylist), 2):
            writer.writerow(mylist[i:i+2])
    index += 1    

以下是04-05-2017的示例页面源行:

<tr class="data-rowKMX9 Bgc($extraLightBlue):h H(36px) Bgc($altRowColor)" data-reactid="490"><td class="data-col0 Ta(start) Pend(15px) Pstart(6px) W(10%)" data-reactid="491"><a href="/quote/KMX?p=KMX" title="Carmax Inc" data-symbol="KMX" class="Fw(b)" data-reactid="492">KMX</a></td><td class="data-col1 Ta(start) Pend(10px) W(20%)" data-reactid="493">Carmax Inc</td><td class="data-col2 Ta(end) Pstart(15px) W(10%)" data-reactid="494">0.79</td><td class="data-col3 Ta(end) Pstart(15px) W(10%)" data-reactid="495">-</td><td class="data-col4 Ta(end) Pstart(15px) W(10%)" data-reactid="496"><span class="" data-reactid="497">-</span></td><td class="data-col5 Ta(end) Pend(6px) Pstart(15px) W(13%)" data-reactid="498"><span data-reactid="499">Before Market Open</span></td></tr>

以下是显示旧格式的示例页面:http://web.archive.org/web/20170301070135/https://biz.yahoo.com/research/earncal/today.html

我能看到的新旧之间的唯一区别是.startswith。使用“http://finance.yahoo.com/quote/”也不起作用。

0 个答案:

没有答案