我试图弄清楚为什么pandas.DataFrame.mean()函数在ndarray的ndarray上工作,但是pandas.DataFrame.std()不会覆盖相同的数据。以下是最低限度的示例。
x = np.array([1,2,3])
y = np.array([4,5,6])
df = pd.DataFrame({"numpy": [x,y]})
df["numpy"].mean() #works as expected
Out[231]: array([ 2.5, 3.5, 4.5])
df["numpy"].std() #does not work as expected
Out[231]: TypeError: setting an array element with a sequence.
但是,如果我通过
进行df["numpy"].values.mean() #works as expected
Out[231]: array([ 2.5, 3.5, 4.5])
df["numpy"].values.std() #works as expected
Out[233]: array([ 1.5, 1.5, 1.5])
调试信息:
df["numpy"].dtype
Out[235]: dtype('O')
df["numpy"][0].dtype
Out[236]: dtype('int32')
df["numpy"].describe()
Out[237]:
count 2
unique 2
top [1, 2, 3]
freq 1
Name: numpy, dtype: object
df["numpy"]
Out[238]:
0 [1, 2, 3]
1 [4, 5, 6]
Name: numpy, dtype: object
答案 0 :(得分:2)
假设您有以下原始DF(在单元格中包含相同形状的numpy数组):
In [320]: df
Out[320]:
file numpy
0 x [1, 2, 3]
1 y [4, 5, 6]
将其转换为以下格式:
In [321]: d = pd.DataFrame(df['numpy'].values.tolist(), index=df['file'])
In [322]: d
Out[322]:
0 1 2
file
x 1 2 3
y 4 5 6
现在您可以自由使用所有Pandas / Numpy / Scipy的力量:
In [323]: d.sum(axis=1)
Out[323]:
file
x 6
y 15
dtype: int64
In [324]: d.sum(axis=0)
Out[324]:
0 5
1 7
2 9
dtype: int64
In [325]: d.mean(axis=0)
Out[325]:
0 2.5
1 3.5
2 4.5
dtype: float64
In [327]: d.std(axis=0)
Out[327]:
0 2.12132
1 2.12132
2 2.12132
dtype: float64