Pandas 0.24.0使用特殊的列标识符破坏了我的Pandas数据框

时间:2019-04-24 04:42:38

标签: pandas python-2.7 dataframe

在尝试在同事的机器上运行它之前,我的代码运行良好,随后我发现,虽然它使用pandas 0.22.0可以工作,但在pandas 0.24.0上却坏了。目前,我们通过降级其熊猫副本来解决此问题,但我希望找到一个更好的解决方案。

问题似乎是我正在创建一个用户定义的类,以用作我在数据框中的列的标识符。当尝试比较两个数据帧时,由于某种原因,它尝试将列标签称为函数,然后由于它们不可调用而引发异常

下面是一些示例代码:

import pandas as pd
import numpy as np

class label(object):
    def __init__(self, var):
        self.var = var
    def __eq__(self,other):
        return self.var == other.var

df = pd.DataFrame(np.eye(5),columns=[label(ii) for ii in range(5)])
df == df

这将产生以下堆栈跟踪:

Traceback (most recent call last):

  File "<ipython-input-4-496e4ab3f9d9>", line 1, in <module>
    df==df1

  File "C:\...\site-packages\pandas\core\ops.py", line 2098, in f
    return dispatch_to_series(self, other, func, str_rep)

  File "C:\...\site-packages\pandas\core\ops.py", line 1157, in dispatch_to_series
    new_data = expressions.evaluate(column_op, str_rep, left, right)

  File "C:\...\site-packages\pandas\core\computation\expressions.py", line 208, in evaluate
    return _evaluate(op, op_str, a, b, **eval_kwargs)

  File "C:\...\site-packages\pandas\core\computation\expressions.py", line 68, in _evaluate_standard
    return op(a, b)

  File "C:\...\site-packages\pandas\core\ops.py", line 1135, in column_op
    for i in range(len(a.columns))}

  File "C:\...\site-packages\pandas\core\ops.py", line 1135, in <dictcomp>
    for i in range(len(a.columns))}

  File "C:\...\site-packages\pandas\core\ops.py", line 1739, in wrapper
    name=res_name).rename(res_name)

  File "C:\...\site-packages\pandas\core\series.py", line 3733, in rename
    return super(Series, self).rename(index=index, **kwargs)

  File "C:\...\site-packages\pandas\core\generic.py", line 1091, in rename
    level=level)

  File "C:\...\site-packages\pandas\core\internals\managers.py", line 171, in rename_axis
    obj.set_axis(axis, _transform_index(self.axes[axis], mapper, level))

  File "C:\...\site-packages\pandas\core\internals\managers.py", line 2004, in _transform_index
    items = [func(x) for x in index]

TypeError: 'label' object is not callable

我发现我可以通过使我的类通过单个参数可调用并返回该参数来解决该问题,但这会破坏.loc索引,默认情况下会将我的对象视为可调用对象。

仅当自定义对象位于列中时,才会出现此问题-索引可以很好地处理它们。

这是错误还是用法更改,有什么办法可以解决而不放弃自定义标签的情况?

0 个答案:

没有答案