我有一个类似于
的Python类class MyClass(namedtuple('mytuple', 'one two')):
def method1:
pass
包含4列的Pandas数据帧。当我尝试将以下lambda应用于数据帧时:
df.apply(lambda x: MyClass(x['col1'], x['col2']), axis=1)
我收到以下错误:
ValueError: Shape of passed values is (218039, 2), indices imply (218039, 4)
这与您尝试在DataFrame.apply()上返回元组时获得的错误完全相同,例如:
df.apply(lambda x: (1,2), axis=1)
但是,如果我将MyClass
定义为
class MyClass2:
def __init__(self, one, two):
self.one = one
self.two = two
def method1():
pass
.apply()方法成功返回(具有n行MyClass2对象的DataFrame)
这是预期的行为吗?似乎df.apply()正在使用namedtuple实例化一个类的实例作为一个命名元组。
编辑:在@root
的评论之后,我在iPython上测试了以下内容In [30]: data = pd.DataFrame(np.random.random((4,5)));
In [33]: type(data)
Out[33]: pandas.core.frame.DataFrame
In [33]: data.shape
Out[33]: (4, 5)
In [34]: data.apply(lambda x: (1,2),axis=1)
Out[34]:
0 (1, 2)
1 (1, 2)
2 (1, 2)
3 (1, 2)
dtype: object
因此,从空数据框开始时没问题
但是,使用原始数据会发生以下情况
In [41]: type(data)
Out[41]: pandas.core.frame.DataFrame
In [42]: pd.__version__
Out[42]: '0.19.2'
In [43]: data.head(1)
Out[43]:
date_start date_end start_lng start_lat
0 2015-12-03 16:25:18 2015-12-03 16:28:56 -8.680015 41.172069
In [44]: data.shape
Out[44]: (218039, 4)
In [45]: data.apply(lambda x: (1,2),axis=1)
Traceback (most recent call last):
File "/Applications/Anaconda/anaconda/envs/py36/lib/python3.6/site-packages/pandas/core/internals.py", line 4263, in create_block_manager_from_arrays
mgr = BlockManager(blocks, axes)
File "/Applications/Anaconda/anaconda/envs/py36/lib/python3.6/site-packages/pandas/core/internals.py", line 2761, in __init__
self._verify_integrity()
File "/Applications/Anaconda/anaconda/envs/py36/lib/python3.6/site-packages/pandas/core/internals.py", line 2971, in _verify_integrity
construction_error(tot_items, block.shape[1:], self.axes)
File "/Applications/Anaconda/anaconda/envs/py36/lib/python3.6/site-packages/pandas/core/internals.py", line 4233, in construction_error
passed, implied))
ValueError: Shape of passed values is (218039, 2), indices imply (218039, 4)
知道这里可能会发生什么吗?