使用python yfinance多线程下载Yahoo股票历史记录

时间:2020-03-05 16:59:41

标签: python multithreading yfinance

我正在尝试下载代码清单的历史数据,并将其分别导出到csv文件。我可以将它作为for循环来使用,但是当股票行情自动收录器的列表在1000时非常慢。我正在尝试对进程进行多线程处理,但是我不断收到许多不同的错误。有时它将仅下载1个文件,而其他时间则下载2或3个文件,甚至下载6个文件,但从未超过此文件。我猜想这与6核12线程处理器有关,但我真的不知道。

import csv
import os
import yfinance as yf
import pandas as pd
from threading import Thread

ticker_list = []

with open('tickers.csv', 'r') as csvfile:
    reader = csv.reader(csvfile, delimiter=',')
    name = None
    for row in reader:
        if row[0]:
            ticker_list.append(row[0])

start_date = '2019-03-03'
end_date = '2020-03-04'

data = pd.DataFrame()

def y_hist(i):
    ticker = ticker_list[i]
    data = yf.download(ticker, start=start_date, end=end_date, group_by="ticker")
    data.to_csv('yhist/' + ticker + '.csv', sep=',', encoding='utf-8')

threads = []

for i in range(os.cpu_count()):
    print('registering thread %d' % i)
    threads.append(Thread(target=y_hist,args=(i,)))

for thread in threads:
    thread.start()

for thread in threads:
    thread.join()

print('done')

这是csv的示例文件,带有行情自动收录器,足以测试此文件。 ticker.csv

这些是我为了使用这项功能而阅读并使用的代码的页面:

multithreading-to-scrap-yahoo-finance

Engineer Man threads

an-introduction-to-asynchronous-programming-in-python

这是其输出的简化版本,也许有助于澄清问题。

import os
import pandas as pd
import yfinance as yf
from threading import Thread

ticker_list = ['IBM','MSFT','QQQ','SPY','FB','XLV','XLF','XLK','XLE','GTHX','IYR','ONE','ROG','OLED','GLD']

def y_hist():
    for ticker in ticker_list:
        print(ticker)

threads = []

for i in range(os.cpu_count()):
    threads.append(Thread(target=y_hist))

for thread in threads:
    thread.start()

for thread in threads:
    thread.join()

输出:

IBM
MSFT
QQQ
SPY
FB
XLV
XLF
XLK
XLE
GTHX
IYR
ONE
ROG
OLED
GLD
IBM
MSFT
QQQ
SPY
FB
XLV
XLF
XLK
XLE
GTHX
IYR
ONE
ROG
OLED
GLD
IBM
MSFT
QQQ
SPY
FB
XLV
XLF
XLK
XLE
GTHX
IYR
ONE
ROG
IBM
MSFT
QQQ
SPY
FB
XLV
XLF
XLK
XLE
GTHX
IYR
ONE
ROG
OLED
GLD
OLEDIBM
MSFT
QQQ
SPY
FB
XLV
XLF
XLK
XLE
GTHX
IYR
ONE
GLD
IBM
MSFT
QQQ
SPY
FB
XLV
XLF
XLK
XLE
GTHX
IYR
ONE
ROG
OLED
IBM
GLD
MSFT
ROG
OLED
GLD

QQQ
SPY
FB
XLV
XLF
XLK
XLE
GTHX
IYR
ONE
ROG
OLED
GLD
IBM
MSFT
QQQ
SPY
FB
XLV
XLF
XLK
XLE
GTHX
IYR
ONE
ROG
OLED
GLD
IBM
MSFT
QQQ
SPY
IBM
MSFT
FB
XLV
XLF
XLK
XLE
GTHX
IYR
ONE
ROG
OLED
GLD
QQQ
SPY
FB
XLV
XLF
XLK
XLE
GTHX
IYR
ONE
ROG
OLED
GLD
IBM
MSFT
QQQ
SPY
FB
XLV
XLF
XLK
XLE
GTHX
IYR
ONE
ROG
OLED
IBM
MSFT
QQQ
SPY
FB
XLV
XLF
XLK
XLE
GTHX
IYR
ONE
ROG
OLED
GLD
GLD

1 个答案:

答案 0 :(得分:1)

虽然这不能直接修复我的损坏代码,但它是一种可以得到相同结果的解决方案。它使用内置的yfinance多线程功能。不幸的是,我仍然不知道为什么原始代码无法正常工作,并且仍然会感谢您提供反馈。同时,如果有人正在寻找解决同一问题的方法,这将起作用。

import csv
import os
import yfinance as yf
import pandas as pd
import time
start = time.time()

ticker_list = []

with open('tickers.csv', 'r') as csvfile:
    reader = csv.reader(csvfile, delimiter=',')
    name = None
    for row in reader:
        if row[0]:
            ticker_list.append(row[0])

data = yf.download(
        tickers = ticker_list,
        period = '1y',
        interval = '1d',
        group_by = 'ticker',
        auto_adjust = False,
        prepost = False,
        threads = True,
        proxy = None
    )

data = data.T

for ticker in ticker_list:
    data.loc[(ticker,),].T.to_csv('yhist/' + ticker + '.csv', sep=',', encoding='utf-8')

print('It took', time.time()-start, 'seconds.')

运行400条行情清单的时间:

将线程设置为True

[********************* 100%*********************** ] 400的400已完成

花了23.420897006988525秒。

将线程设置为False

[********************* 100%*********************** ] 400的400已完成

花了133.77732181549072秒。