如果我有一个数据框列表,我想计算列表的平均数据帧和平均值。
通过以下方式生成样本数据:
d1 = np.random.normal(0, 0.5, 20).reshape(-1, 4)
d2 = np.random.normal(0, 0.5, 20).reshape(-1, 4)
d3 = np.random.normal(0, 0.5, 20).reshape(-1, 4)
df1 = pd.DataFrame(d1)
df2 = pd.DataFrame(d2)
df3 = pd.DataFrame(d3)
df_list = [df1, df2, df3]
我会做什么:
from functools import reduce
average_df = reduce(lambda x, y: x + y, df_list)/len(df_list)
average_df
0 1 2 3
0 -0.034682 -0.022264 -0.138824 -0.146104
1 0.419488 0.383894 -0.152312 -0.306009
2 0.155335 0.317097 -0.225921 -0.178944
3 -0.383138 -0.120236 0.069074 0.050598
4 0.050671 0.368507 0.010924 0.394945
average_value = average_df.mean().mean()
average_value
0.02560486119
问题: 我认为这不是最好的方法。有没有更快的方式/内置功能呢?
答案 0 :(得分:1)
如果您不介意丢弃标记的轴,我只会转换为np.array
:
In [16]: df_list = [df1, df2, df3]
In [17]: df_list
Out[17]:
[ 0 1 2 3
0 0.132306 -0.364364 0.596958 0.406588
1 0.831853 0.049103 -0.606819 -0.858509
2 0.251377 0.656292 -0.402637 -0.079849
3 -0.803913 0.047060 0.684442 -0.593213
4 -0.376936 0.213803 -0.684231 0.042000,
0 1 2 3
0 -0.178879 -0.016869 -0.232023 -0.166521
1 -0.588778 0.013769 0.540631 0.381502
2 -0.995349 0.155972 0.023558 -0.307145
3 0.462249 -0.742847 0.235321 0.395132
4 -0.053568 -0.329233 0.132231 0.917006,
0 1 2 3
0 0.352663 -0.832304 0.072619 -0.393198
1 1.038936 0.923296 0.657013 -0.034282
2 0.090368 0.433762 -0.305223 -0.378425
3 0.046863 0.248066 -0.418274 -0.522701
4 0.222447 -0.322698 -0.262695 -0.718779]
In [18]: arr = np.array([df.values for df in df_list])
In [19]: arr
Out[19]:
array([[[ 0.1323056 , -0.36436411, 0.59695824, 0.4065878 ],
[ 0.83185277, 0.04910304, -0.60681886, -0.85850892],
[ 0.25137706, 0.6562918 , -0.4026369 , -0.07984943],
[-0.80391254, 0.04706034, 0.68444161, -0.59321321],
[-0.37693554, 0.21380315, -0.68423123, 0.04199972]],
[[-0.17887865, -0.01686896, -0.23202261, -0.16652074],
[-0.58877762, 0.01376924, 0.54063094, 0.38150206],
[-0.99534857, 0.15597235, 0.02355771, -0.30714476],
[ 0.46224899, -0.74284654, 0.23532056, 0.39513248],
[-0.05356796, -0.3292326 , 0.13223064, 0.91700633]],
[[ 0.35266324, -0.83230408, 0.07261917, -0.39319835],
[ 1.03893574, 0.92329583, 0.65701318, -0.03428247],
[ 0.0903683 , 0.43376195, -0.30522277, -0.37842503],
[ 0.04686314, 0.24806568, -0.41827387, -0.52270129],
[ 0.22244721, -0.32269779, -0.2626949 , -0.71877921]]])
然后你只想:
In [20]: arr.mean(axis=0)
Out[20]:
array([[ 0.10203006, -0.40451238, 0.1458516 , -0.05104376],
[ 0.42733697, 0.3287227 , 0.19694175, -0.17042977],
[-0.21786774, 0.41534203, -0.22810065, -0.25513974],
[-0.0982668 , -0.14924017, 0.16716277, -0.24026067],
[-0.0693521 , -0.14604242, -0.27156516, 0.08007561]])
In [21]: arr.mean()
Out[21]: -0.021917893552528236
答案 1 :(得分:0)
sum
sum(df_list) / len(df_list)
0 1 2 3
0 0.102030 -0.404512 0.145851 -0.051044
1 0.427337 0.328723 0.196942 -0.170430
2 -0.217868 0.415342 -0.228101 -0.255140
3 -0.098267 -0.149240 0.167163 -0.240261
4 -0.069352 -0.146043 -0.271565 0.080076
pd.concat
pd.concat(df_list).mean(level=0)
0 1 2 3
0 0.102030 -0.404512 0.145851 -0.051044
1 0.427337 0.328723 0.196942 -0.170430
2 -0.217868 0.415342 -0.228101 -0.255140
3 -0.098267 -0.149240 0.167163 -0.240261
4 -0.069352 -0.146043 -0.271565 0.080076