Pandas上的奇怪行为df.apply()

时间:2017-04-07 16:39:56

标签: python pandas lambda apply

我有一个类似于

的Python类
class MyClass(namedtuple('mytuple', 'one two')):
    def method1:
        pass

包含4列的Pandas数据帧。当我尝试将以下lambda应用于数据帧时:

df.apply(lambda x: MyClass(x['col1'], x['col2']), axis=1)

我收到以下错误:

ValueError: Shape of passed values is (218039, 2), indices imply (218039, 4)

这与您尝试在DataFrame.apply()上返回元组时获得的错误完全相同,例如: df.apply(lambda x: (1,2), axis=1)

但是,如果我将MyClass定义为

class MyClass2:
    def __init__(self, one, two):
        self.one = one
        self.two = two   
    def method1():
        pass

.apply()方法成功返回(具有n行MyClass2对象的DataFrame)

这是预期的行为吗?似乎df.apply()正在使用namedtuple实例化一个类的实例作为一个命名元组。

编辑:在@root

的评论之后,我在iPython上测试了以下内容
In [30]: data = pd.DataFrame(np.random.random((4,5)));

In [33]: type(data)
Out[33]: pandas.core.frame.DataFrame 

In [33]: data.shape
Out[33]: (4, 5)

In [34]: data.apply(lambda x: (1,2),axis=1)
Out[34]: 
0    (1, 2)
1    (1, 2)
2    (1, 2)
3    (1, 2)
dtype: object

因此,从空数据框开始时没问题

但是,使用原始数据会发生以下情况

In [41]: type(data)
Out[41]: pandas.core.frame.DataFrame
In [42]: pd.__version__
Out[42]: '0.19.2'
In [43]: data.head(1)
Out[43]: 
       date_start            date_end  start_lng  start_lat
0 2015-12-03 16:25:18 2015-12-03 16:28:56  -8.680015  41.172069
In [44]: data.shape
Out[44]: (218039, 4)

In [45]: data.apply(lambda x: (1,2),axis=1)
Traceback (most recent call last):
  File "/Applications/Anaconda/anaconda/envs/py36/lib/python3.6/site-packages/pandas/core/internals.py", line 4263, in create_block_manager_from_arrays
mgr = BlockManager(blocks, axes)
  File "/Applications/Anaconda/anaconda/envs/py36/lib/python3.6/site-packages/pandas/core/internals.py", line 2761, in __init__
self._verify_integrity()
  File "/Applications/Anaconda/anaconda/envs/py36/lib/python3.6/site-packages/pandas/core/internals.py", line 2971, in _verify_integrity
construction_error(tot_items, block.shape[1:], self.axes)
  File "/Applications/Anaconda/anaconda/envs/py36/lib/python3.6/site-packages/pandas/core/internals.py", line 4233, in construction_error
passed, implied))

ValueError: Shape of passed values is (218039, 2), indices imply (218039, 4)

知道这里可能会发生什么吗?

0 个答案:

没有答案