我有一个包含多列的Pandas DataFrame。
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(3, 8), index=['A', 'B', 'C'], columns=index)
print(df)
first bar baz foo qux \
second one two one two one two one
A -0.093829 -0.159939 -0.386961 -0.367417 0.625646 1.286186 0.429855
B 0.440266 0.345161 1.798363 -1.265215 0.204303 -1.492993 -1.714360
C 0.689076 -1.211060 -0.265888 0.769467 -0.706941 0.086907 -0.892892
first
second two
A -1.006210
B -0.275578
C -0.563757
我想计算每列的平均值和标准偏差,按上一列分组。一旦我计算了平均值和标准偏差,我想将下一级中的列加倍,将与统计操作相关的信息(平均值或标准偏差)添加到列名称为"列名称" +" _" +" std / mean"。
group_cols = df.groupby(df.columns.get_level_values('first'), axis=1)
list_stat_dfs = []
for key, group in group_cols:
group_descr = group.describe().loc[['mean', 'std'], :] # Get mean and std from single site
group_descr.loc[:, (key, 'stats')] = group_descr.index
group_descr.loc[:, (key, 'first')] = key
group_descr.columns = group_descr.columns.droplevel(0) # Remove upper level column (site_name)
group_descr = group_descr.pivot(columns='stats', index='first') # Rows to columns
col_prod = list(product(group_descr.columns.levels[0], group_descr.columns.levels[1]))
cols = ['_'.join((col[0], col[1])) for col in col_prod]
group_descr.columns = pd.MultiIndex.from_product(([key], cols)) # From multiple columns to single column
group_descr.reset_index(inplace=True)
list_stat_dfs.append(group_descr)
group_descr = pd.concat(list_stat_dfs, axis=1)
print(group_descr)
first bar first baz \
one_mean one_std two_mean two_std one_mean one_std
0 bar 0.507185 1.799053 -0.249692 1.41507 baz -0.147664 0.595927
first foo first \
two_mean two_std one_mean one_std two_mean two_std
0 0.160018 1.405113 foo -0.433644 1.245972 0.254995 0.846983 qux
qux
one_mean one_std two_mean two_std
0 0.667629 0.315417 -0.757989 0.683273
正如您所看到的,我已经能够使用for循环和一些代码来管理它。有人可以用更优化的方式做同样的事情。我很确定使用Pandas,可以用几行代码完成同样的事情。