这是我的情况。我正在尝试建立一个巨大的数据库,其中包含纽约证券交易所和纳斯达克股票的所有历史数据(2017年1月1日至2019年6月30日)及其指标。
所有4000多种股票都存储在一个名为“ ALLSTOCKS”的表中。每天使用我的csv上传更新此表。
csv下载包含每个单独股票的开盘价,最高价,最低价和收盘价,它们将从它们各自的列中存储。根据这些数字,我的python代码将自动执行计算。这些计算的一个很好的例子是获得9、20、50和100天的收盘平均值。
为此,我将每只股票的最后9、20、50和CLOSE天关闭,并执行一个简单的mysql AVG()函数,并将其从“ ALLSTOCK”数据库存储到分配的列(MA9)。
我已经提到股票总数超过4000,对吗?因此,我决定将平均公式放在FOR LOOP中。
以下是我的一些代码:
for ticker in tickers:
mycursor.execute("SELECT format(AVG(Close),4) from (select Close from _PSEI where stock = '" + ticker + "' ORDER BY ID DESC LIMIT 0,9) _PSEI")
manine = mycursor.fetchone()[0]
mycursor.execute("SELECT format(AVG(Close),4) from (select Close from _PSEI where stock = '" + ticker + "' ORDER BY ID DESC LIMIT 0,20) _PSEI")
matwenty = mycursor.fetchone()[0]
无论如何,问题在于,这是一个 for循环,其变量数组包含4000多个项目。我得到的结果很慢。意思是,我的代码每只股票执行大约0.3到.5秒,并且完成整个循环最多需要2000秒。
这是变量数组的一部分(部分):
ticker = ["CHK","BAC","GE","VALE","T","F","PFE","GGB","ECA","SWN","BBD","GME","RRC","FCX","AUY","AVP","APC","KGC","PBR","WFC","S","NBR","DB","C","SAN","KO","PG","RIG","HAL","MRK","X","NOK","APA","DNR","JPM","NLY","MRO","GFI","VZ","RF","XOM","NEM","NKE","HPQ","MS","CLF","DAL","SLB","M","ESV","V","KR","CTL","KEY","JCP","OXY","DIS","BP","CIG","EOG","IAG","MO","GM","RIO","EQT","GOL","HMY","ABB","DVN","MGM"]
总有没有要使其更快?您可以建议任何快速方法吗?如果遇到这种情况怎么办?
答案 0 :(得分:1)
这花了很长时间,因为您要对4000多种股票中的每一种进行查询(数据库调用)。
我将尝试运行一个查询来检查所有股票,例如:
mycursor.execute("SELECT ... where stock in ('CHK', 'GE', 'BAC', ...) ...")
权衡将是一件好事,因为这个繁重的查询将仅被调用一次(与以当前实现方式执行的〜4000个查询相比)。
通常,从性能上来说,减少对DB的调用(并在每次调用中带来更多数据)要好得多,因为每次调用的开销都很大。