提高Pandas网络抓取速度

时间:2017-02-09 21:14:21

标签: python pandas web-scraping

我目前正在使用pandas pd.read_html()从Fidelity获取信息。它工作正常,但我很好奇是否有办法提高速度。以下是我目前的代码供参考。我只使用了我所使用的代码的子集。

ticker_list = ['FHLC','ONEQ','FTEC']
for i in ticker_list:
    print(i)
    total = 0 
    results = pd.DataFrame()
    while True:
        try:
            url = 'http://research2.fidelity.com/fidelity/screeners/etf/public/etfholdings.asp?symbol={}&view=Holdings&page={}'.format(i,total)
            print(url)
            hd = pd.read_html(url)[0]
            hd['Weight'] = hd['Weight'].apply(lambda x: float(x.split('%')[0])/100)
            print(hd['Weight'].sum())
            results = results.append(hd)
            total += 1 
        except:
            print('Nothing grabbed at page {}'.format(total))
            break
    if results['Weight'].sum() > 0:
        results.to_csv('{}{}.csv'.format(i,todays_date),index=False)
    else:
        print('This fund had no information to add {}'.format(i))

0 个答案:

没有答案