假设一个大熊猫数据帧如图所示,我想用与之相似的其他变量的值填充na值。更清楚地说,我的变量是
mean_1, mean_2 .... , std_1, std_2, ... min_1, min_2 ...
因此,我想用其他列的值填充na值,但并非所有列,仅填充那些表示相同度量的列,在图中我将2个na值设为高。我想用从第2行的变量“ MEAN”获得的平均值填充它的第一个,而我想用从第9行的变量“ MIN”获得的平均值填充它。做吗?
答案 0 :(得分:1)
是的,可以使用循环来完成。下面是幼稚的方法,但是即使对于比较高级的方法,它也没有太多的优化(至少我看不到它们)。
for i, row in df.iterrows():
sum_means = 0
n_means = 0
sum_stds = 0
n_stds = 0
fill_mean_idxs = []
fill_std_idxs = []
for idx, item in item.iteritems():
if idx.startswith('mean') and item is None:
fill_mean_idxs.append(idx)
elif idx.startswith('mean'):
sum_means += float(item)
n_means += 1
elif idx.startswith('std') and item is None:
fill_std_idxs.append(idx)
elif idx.startswith('std'):
sum_stds += float(item)
n_stds += 1
ave_mean = sum_means / n_means
std_mean = sum_stds / n_stds
for idx in fill_mean_idx:
df.loc[i, idx] = ave_mean
for idx in fill_std_idx:
df.loc[i, idx] = std_mean
答案 1 :(得分:1)
您可以找到唯一的前缀,对每个前缀进行迭代,并分别对子集执行fillna
uniq_prefixes = set([x.split('_')[0] for x in df.columns])
for prfx in uniq_prefixes:
mask = [col for col in df if col.startswith(prfx)]
# Transpose is needed because row wise fillna is not implemented yet
df.loc[:,mask] = df[mask].T.fillna(df[mask].mean(axis=1)).T