我们说我有一个非常简单的数据框:
import pandas as pd
df = pd.DataFrame(np.full((6), 1))
现在我将定义一个函数,该函数生成一个随机长度的numpy数组,并将给定值添加到尾部:
import numpy as np
def func(row):
l = np.full((np.random.random_integer(5)), 1)
return np.hstack(l, row)
当我尝试将该功能应用于df
以获得2-D array
:
df.apply(func, axis=1),
我收到了这样的错误:
ValueError: Shape of passed values is (6, 2), indices imply (6, 1)
您知道问题是什么以及如何解决?先感谢您!
答案 0 :(得分:1)
首先你需要np.random.random_integers
,其次hstack
需要一个元组,所以传递一个元组,第三,你需要返回它可以对齐的东西,所以在这种情况下Series
:
In [213]:
df = pd.DataFrame(np.full((6), 1))
def func(row):
l = np.full((np.random.random_integers(5)), 1)
return pd.Series(np.hstack((l, row)))
In [214]:
df.apply(func, axis=1)
Out[214]:
0 1 2 3 4 5
0 1.0 1.0 1.0 NaN NaN NaN
1 1.0 1.0 NaN NaN NaN NaN
2 1.0 1.0 NaN NaN NaN NaN
3 1.0 1.0 1.0 NaN NaN NaN
4 1.0 1.0 1.0 1.0 1.0 NaN
5 1.0 1.0 1.0 1.0 1.0 1.0
请注意,我收到了大量有关上述内容的警告:
C:\WinPython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\numpy\core\numeric.py:301: FutureWarning: in the future, full(3, 1) will return an array of dtype('int32')
format(shape, fill_value, array(fill_value).dtype), FutureWarning)
C:\WinPython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\numpy\core\numeric.py:301: FutureWarning: in the future, full(2, 1) will return an array of dtype('int32')
format(shape, fill_value, array(fill_value).dtype), FutureWarning)
C:\WinPython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\numpy\core\numeric.py:301: FutureWarning: in the future, full(1, 1) will return an array of dtype('int32')
format(shape, fill_value, array(fill_value).dtype), FutureWarning)
C:\WinPython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\numpy\core\numeric.py:301: FutureWarning: in the future, full(4, 1) will return an array of dtype('int32')
format(shape, fill_value, array(fill_value).dtype), FutureWarning)
C:\WinPython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\numpy\core\numeric.py:301: FutureWarning: in the future, full(5, 1) will return an array of dtype('int32')
format(shape, fill_value, array(fill_value).dtype), FutureWarning)
从df调用属性values
获取np数组:
df.apply(func, axis=1).values