由于难以解释的原因,我想在pandas数据帧中平均细胞块,该数据帧稀疏地填充了随机值。数据帧将始终具有sqrt(列数x索引数)值 - 其余所有值均为NaN。这些值大致均匀分布,因此如果我平均正确大小的单元格块,我希望每个块中都有一个值。
这是我的例子。对于100列和100个索引,我在整个数据帧中随机分布了100个值。我希望每10x10块有〜1个值,其他所有块都是NaN。如何将每个10x10块转换为一个单元格(平均10列,10个索引和值)?
我的代码:
import pandas as pd
import numpy as np
import math
number_of_planes = 100
thicknesses = np.empty(number_of_planes)
cos_thetas = np.empty(number_of_planes)
phis = np.empty(number_of_planes)
for i in range(0,number_of_planes):
r = 1
phi = np.random.uniform(0,2*math.pi)
theta = math.acos(2*np.random.uniform(0.5,1) - 1)
thickness = np.random.uniform(0,0.4)
phis[i] = phi
cos_thetas[i] = math.cos(theta)
thicknesses[i] = thickness
thick_df = pd.DataFrame(columns=phis, index=cos_thetas)
for i in range(0, len(thicknesses)):
thick_df.set_value(cos_thetas[i], phis[i], thicknesses[i], takeable=False)
thick_df = thick_df.sort_index(axis=0, ascending=False)
thick_df = thick_df.sort_index(axis=1)
答案 0 :(得分:3)
IIUC你可以重塑成一个4D阵列,将每个轴分成两个长度为sqrt(len of each axis)
的轴,沿第二和第四轴计算平均值,忽略NaNs
和np.nanmean
-
arr = thick_df.values.astype(float)
n = int(np.sqrt(number_of_planes))
out = np.nanmean(arr.reshape(n,n,n,n),axis=(1,3))
indx = thick_df.index.values.reshape(-1,n).mean(1)
coln = thick_df.columns.values.reshape(-1,n).mean(1)
df_out = pd.DataFrame(out, index=indx, columns= coln)
示例运行 -
In [174]: thick_df # number_of_planes = 4
Out[174]:
4.550477 5.138694 5.411510 6.123163
0.981987 NaN NaN 0.393233 NaN
0.565861 0.186647 NaN NaN NaN
0.193190 NaN NaN NaN 0.11626
0.088382 NaN 0.166189 NaN NaN
In [175]: df_out
Out[175]:
4.844586 5.767337
0.773924 0.186647 0.393233
0.140786 0.166189 0.116260
答案 1 :(得分:3)
m, n = 10, 10
row_groups = np.arange(len(thick_df.index)) // m
col_groups = np.arange(len(thick_df.columns)) // n
grpd = pd.DataFrame(thick_df.values, row_groups, col_groups)
val = pd.to_numeric(grpd.stack(), 'coerce').groupby(level=[0, 1]).mean().unstack().values
idx = thick_df.index.to_series().groupby(row_groups).mean().values
col = thick_df.columns.to_series().groupby(col_groups).mean().values
pd.DataFrame(val, idx, col)