根据名称相似的列对行进行分组

时间:2020-11-03 17:46:30

标签: python pandas dataframe

假设一个大熊猫数据帧如图所示,我想用与之相似的其他变量的值填充na值。更清楚地说,我的变量是

mean_1, mean_2 .... , std_1, std_2, ... min_1, min_2 ...

因此,我想用其他列的值填充na值,但并非所有列,仅填充那些表示相同度量的列,在图中我将2个na值设为高。我想用从第2行的变量“ MEAN”获得的平均值填充它的第一个,而我想用从第9行的变量“ MIN”获得的平均值填充它。做吗?

enter image description here

2 个答案:

答案 0 :(得分:1)

是的,可以使用循环来完成。下面是幼稚的方法,但是即使对于比较高级的方法,它也没有太多的优化(至少我看不到它们)。

for i, row in df.iterrows():
    sum_means = 0
    n_means = 0
    sum_stds = 0
    n_stds = 0
    fill_mean_idxs = []
    fill_std_idxs = []
    for idx, item in item.iteritems():
        if idx.startswith('mean') and item is None:
            fill_mean_idxs.append(idx)
        elif idx.startswith('mean'):
            sum_means += float(item)
            n_means += 1
        elif idx.startswith('std') and item is None:
            fill_std_idxs.append(idx)
        elif idx.startswith('std'):
            sum_stds += float(item)
            n_stds += 1
    ave_mean = sum_means / n_means
    std_mean = sum_stds / n_stds
    for idx in fill_mean_idx:
        df.loc[i, idx] = ave_mean
    for idx in fill_std_idx:
        df.loc[i, idx] = std_mean

答案 1 :(得分:1)

您可以找到唯一的前缀,对每个前缀进行迭代,并分别对子集执行fillna

uniq_prefixes = set([x.split('_')[0] for x in df.columns])

for prfx in uniq_prefixes:
    mask = [col for col in df if col.startswith(prfx)]
    # Transpose is needed because row wise fillna  is not implemented yet
    df.loc[:,mask] = df[mask].T.fillna(df[mask].mean(axis=1)).T