堆栈溢出与pandas_datareader

时间:2018-04-07 03:01:31

标签: python pandas-datareader

我有一个从mongodb查询的python线程,这是我第一次遇到这个错误。这也是我第一次查询我的数据库。它是一个拥有5亿个文档的大型数据库。这是错误:

IV
BIVV
adding AAXN to retry list
adding AABA to retry list
Fatal Python error: Cannot recover from stack overflow.

我没有编写添加评论的代码。 Python似乎是在堆栈溢出错误之前将这些股票符号添加回队列,然后所有线程都死掉。

我尝试在从队列获取的每次迭代中调用gc.collect但是没有修复它。这种情况发生在15个线程和5个线程,在相同的股票代码。我很确定我没有任何内存泄漏。我应该只删除每个线程在每次迭代时拥有的所有变量吗?也许尝试多进程而不是多线程?有什么建议吗?

1 个答案:

答案 0 :(得分:-1)

我更多地关注“添加”评论,并由pandas_datareader打印。特别是mstar / daily.py。

似乎重试计数永远不会增加。这会产生递归堆栈溢出错误。

def _dl_mult_symbols(self, symbols):
    failed = []
    symbol_data = []
    for symbol in symbols:

        params = self._url_params()
        params.update({"ticker": symbol})

        try:
            resp = requests.get(self.url, params=params)
        except Exception:
            if symbol not in failed:
                if self.retry_count == 0:
                    warn("skipping symbol %s: number of retries "
                         "exceeded." % symbol)
                    pass
                else:
                    print("adding %s to retry list" % symbol)
                    failed.append(symbol)
        else:
            if resp.status_code == requests.codes.ok:
                jsondata = resp.json()
                if jsondata is None:
                    failed.append(symbol)
                    continue
                jsdata = self._restruct_json(symbol=symbol,
                                             jsondata=jsondata)
                symbol_data.extend(jsdata)
            else:
                raise Exception("Request Error!: %s : %s" % (
                    resp.status_code, resp.reason))

        time.sleep(self.pause)

    if len(failed) > 0 and self.retry_count > 0:
        # TODO: This appears to do nothing since
        # TODO: successful symbols are not added to
        self._dl_mult_symbols(symbols=failed)
        self.retry_count -= 1
    else:
        self.retry_count = 0

    if not symbol_data:
        raise ValueError('All symbols were invalid')
    elif self.retry_count == 0 and len(failed) > 0:
        warn("The following symbols were excluded do to http "
             "request errors: \n %s" % failed, SymbolWarning)

    symbols_df = DataFrame(data=symbol_data)
    dfx = symbols_df.set_index(["Symbol", "Date"])
    return dfx