如何加快我的进程

时间:2018-03-11 01:16:06

标签: python performance screen-scraping

我写了一个脚本,用于网络抓取数据以获取股票清单。刮刀必须从2个单独的页面获取数据,因此每个库存符号必须刮取2个不同的页面。如果我在1000个项目的列表上运行该过程,则需要大约30分钟才能完成。这并不可怕,我可以设置并忘记它,但我想知道是否有办法加快这个过程。也许存储数据并等待在最后而不是在每个循环中写出所有数据?任何其他想法赞赏。

import requests
from BeautifulSoup import BeautifulSoup
from progressbar import ProgressBar
import csv

symbols = {'AMBTQ','AABA','AAOI','AAPL','AAWC','ABEC','ABQQ','ACFN','ACIA','ACIW','ACLS'}
pbar = ProgressBar()

with open('industrials.csv', "ab") as csv_file:
    writer = csv.writer(csv_file, delimiter=',')
    writer.writerow(['Symbol','5 Yr EPS','EPS TTM'])
    for s in pbar(symbols):
        try:
            url1 = 'https://research.tdameritrade.com/grid/public/research/stocks/fundamentals?symbol='
            full1 = url1 + s
            response1 = requests.get(full1)
            html1 = response1.content
            soup1 = BeautifulSoup(html1)

            for hist_div in soup1.find("div", {"data-module-name": "HistoricGrowthAndShareDetailModule"}):
                EPS5yr = hist_div.find('label').text

        except Exception as e:
            EPS5yr = 'Bad Data'
            pass

        try:
            url2 = 'https://research.tdameritrade.com/grid/public/research/stocks/summary?symbol='
            full2 = url2 + s
            response2 = requests.get(full2)
            html2 = response2.content
            soup2 = BeautifulSoup(html2)

            for div in soup2.find("div", {"data-module-name": "StockSummaryModule"}):
                EPSttm = div.findAll("dd")[11].text

        except Exception as e:
            EPSttm = "Bad data"
            pass

        writer.writerow([s,EPS5yr,EPSttm])

0 个答案:

没有答案