使用库时缩短大堆栈跟踪

时间:2017-08-23 15:35:32

标签: python python-3.x stack-trace

我经常使用大型图书馆,例如pandasmatplotlib

这意味着异常通常会产生很长的堆栈跟踪。

由于库中的错误极少,并且经常使用我自己的代码,因此在绝大多数情况下我都不需要查看库的详细信息。

几个常见的例子:

熊猫

>>> import pandas as pd
>>> df = pd.DataFrame(dict(a=[1,2,3]))
>>> df['b'] # Hint: there _is_ no 'b'

这里我试图访问一个未知密钥。这个简单的错误会产生一个包含28行的堆栈跟踪:

Traceback (most recent call last):
  File "an_arbitrary_python\lib\site-packages\pandas\core\indexes\base.py", line 2393, in get_loc
    return self._engine.get_loc(key)
  File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5239)
  File "pandas\_libs\index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5085)
  File "pandas\_libs\hashtable_class_helper.pxi", line 1207, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20405)
  File "pandas\_libs\hashtable_class_helper.pxi", line 1215, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20359)
KeyError: 'b'

During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "an_arbitrary_python\lib\site-packages\pandas\core\frame.py", line 2062, in __getitem__
        return self._getitem_column(key)
      File "an_arbitrary_python\lib\site-packages\pandas\core\frame.py", line 2069, in _getitem_column
        return self._get_item_cache(key)
      File "an_arbitrary_python\lib\site-packages\pandas\core\generic.py", line 1534, in _get_item_cache
        values = self._data.get(item)
      File "an_arbitrary_python\lib\site-packages\pandas\core\internals.py", line 3590, in get
        loc = self.items.get_loc(item)
      File "an_arbitrary_python\lib\site-packages\pandas\core\indexes\base.py", line 2395, in get_loc
        return self._engine.get_loc(self._maybe_cast_indexer(key))
      File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5239)
      File "pandas\_libs\index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5085)
      File "pandas\_libs\hashtable_class_helper.pxi", line 1207, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20405)
      File "pandas\_libs\hashtable_class_helper.pxi", line 1215, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20359)
    KeyError: 'b'

知道我最终进入hashtable_class_helper.pxi对我来说几乎没用。我需要知道我的代码在哪里搞砸了。

Matplotlib

>>> import matplotlib.pyplot as plt
>>> import matplotlib.cm as cm
>>> def foo():
...     plt.plot([1,2,3], cbap=cm.Blues) # cbap is a typo for cmap
...
>>> def bar():
...     foo()
...
>>> bar()

这一次,我的关键字参数中有一个拼写错误。但是我仍然需要看到25行堆栈跟踪:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in bar
  File "<stdin>", line 2, in foo
  File "an_arbitrary_python\lib\site-packages\matplotlib\pyplot.py", line 3317, in plot
    ret = ax.plot(*args, **kwargs)
  File "an_arbitrary_python\lib\site-packages\matplotlib\__init__.py", line 1897, in inner
    return func(ax, *args, **kwargs)
  File "an_arbitrary_python\lib\site-packages\matplotlib\axes\_axes.py", line 1406, in plot
    for line in self._get_lines(*args, **kwargs):
  File "an_arbitrary_python\lib\site-packages\matplotlib\axes\_base.py", line 407, in _grab_next_args
    for seg in self._plot_args(remaining, kwargs):
  File "an_arbitrary_python\lib\site-packages\matplotlib\axes\_base.py", line 395, in _plot_args
    seg = func(x[:, j % ncx], y[:, j % ncy], kw, kwargs)
  File "an_arbitrary_python\lib\site-packages\matplotlib\axes\_base.py", line 302, in _makeline
    seg = mlines.Line2D(x, y, **kw)
  File "an_arbitrary_python\lib\site-packages\matplotlib\lines.py", line 431, in __init__
    self.update(kwargs)
  File "an_arbitrary_python\lib\site-packages\matplotlib\artist.py", line 885, in update
    for k, v in props.items()]
  File "an_arbitrary_python\lib\site-packages\matplotlib\artist.py", line 885, in <listcomp>
    for k, v in props.items()]
  File "an_arbitrary_python\lib\site-packages\matplotlib\artist.py", line 878, in _update_property
    raise AttributeError('Unknown property %s' % k)
AttributeError: Unknown property cbap

在这里,我发现我在artist.py中的一行引发了AttributeError,然后直接看到AttributeError确实被提升了。这在信息术语中没有多大价值。

在这些简单的交互式示例中,您可能只是说“查看堆栈跟踪的顶部,而不是底部”,但通常我的愚蠢错字发生在一个函数中,所以我感兴趣的行是在某个地方这些库杂乱的堆栈跟踪的中间。

有什么方法可以让这些堆栈跟踪更简洁,并帮助我找到问题的根源,这几乎总是在我自己的代码中,而不是在我碰巧使用的库中?

1 个答案:

答案 0 :(得分:1)

您可以使用traceback更好地控制例外打印。例如:

import pandas as pd
import traceback

try:
    df = pd.DataFrame(dict(a=[1,2,3]))
    df['b']

except Exception, e:
    traceback.print_exc(limit=1)
    exit(1)

这会触发标准异常打印机制,但只显示堆栈跟踪的第一帧(在您的示例中是您关注的那个)。对我来说,这会产生:

Traceback (most recent call last):
  File "t.py", line 6, in <module>
    df['b']
KeyError: 'b'

显然你丢失了上下文,这在调试你自己的代码时很重要。如果我们想要获得幻想,我们可以尝试设计一个测试,看看回溯应该走多远。例如:

def find_depth(tb, continue_test):
    depth = 0

    while tb is not None:
        filename = tb.tb_frame.f_code.co_filename

        # Run the test we're given against the filename
        if not continue_test(filename):
            return depth

        tb = tb.tb_next
        depth += 1

我不知道您是如何组织和运行代码的,但也许您可以执行以下操作:

import pandas as pd
import traceback
import sys

def find_depth():
    # ... code from above here ...

try:
    df = pd.DataFrame(dict(a=[1, 2, 3]))
    df['b']

except Exception, e:
    traceback.print_exc(limit=get_depth(
        sys.exc_info()[2],
        # The test for which frames we should include
        lambda filename: filename.startswith('my_module')
    ))
    exit(1)