如何根据列和数组中的数据填充值?熊猫

时间:2017-10-25 16:22:15

标签: python pandas numpy

假设我在每个组中都有nans的数据帧,如

df = pd.DataFrame({'data':[0,1,2,0,np.nan,2,np.nan,0,1],'group':[1,1,1,2,2,2,3,3,3]})

和像

一样的numpy数组
x = np.array([0,1,2])

现在基于组如何填充我所拥有的numpy数组中的缺失值,即

df = pd.DataFrame({'data':[0,1,2,0,1,2,2,0,1],'group':[1,1,1,2,2,2,3,3,3]})
      data   group
0     0      1
1     1      1
2     2      1
3     0      2
4     1      2
5     2      2
6     2      3
7     0      3
8     1      3

让我解释一下如何填写数据。考虑组2. data的值为0,np.nan,2。 np.nan是数组[0,1,2]中缺少的值。因此,要填补的数据是1

对于多个nan值,取一个例如具有数据[np.nan,0,np.nan]的组,现在要填充的值代替nan是1和2.导致[1,0,2]

1 个答案:

答案 0 :(得分:4)

首先查找遗漏的值,然后将其添加到fillna

def f(y):
    a = list(set(x)-set(y))
    a = 1 if len(a) == 0 else a[0]
    y = y.fillna(a)
    return (y)

df['data'] = df.groupby('group')['data'].apply(f).astype(int)
print (df)
   data  group
0     0      1
1     1      1
2     2      1
3     0      2
4     1      2
5     2      2
6     2      3
7     0      3
8     1      3

编辑:

df = pd.DataFrame({'data':[0,1,2,0,np.nan,2,np.nan,np.nan,1, np.nan, np.nan, np.nan],
                   'group':[1,1,1,2,2,2,3,3,3,4,4,4]})
x = np.array([0,1,2])
print (df)
    data  group
0    0.0      1
1    1.0      1
2    2.0      1
3    0.0      2
4    NaN      2
5    2.0      2
6    NaN      3
7    NaN      3
8    1.0      3
9    NaN      4
10   NaN      4
11   NaN      4
def f(y):
    a = list(set(x)-set(y))
    if len(a) == 1:
        return y.fillna(a[0])
    elif len(a) == 2:
        return y.fillna(a[0], limit=1).fillna(a[1])
    elif len(a) == 3:
        y = pd.Series(x, index=y.index)
        return y
    else:
        return y

df['data'] = df.groupby('group')['data'].apply(f).astype(int)
print (df)
    data  group
0      0      1
1      1      1
2      2      1
3      0      2
4      1      2
5      2      2
6      0      3
7      2      3
8      1      3
9      0      4
10     1      4
11     2      4