替换pandas中的NaN值

时间:2017-09-20 15:09:52

标签: python python-3.x pandas

我有一个DataFrame。它的两个专栏是' Medicine_ID'和' Counterfeit_Weight'。

对于' Medicine_ID'的每个值。 “假冒伪劣”中有NaN或固定值。柱。如何将此NaN值替换为' Medicine_ID'的特定值的相应固定值?

我的数据摘录:

this link

train_data.loc [train_data [' Medicine_ID'] ==' IXN93']#为特定值' Medicine_ID'

enter image description here

2 个答案:

答案 0 :(得分:2)

如果我的理解是正确的,您可以使用mode

数据输入

df=pd.DataFrame({'Medicine_ID':["A","B","C","D"],'Counterfeit_Weight':[999,2,np.nan,np.nan]})
df1=pd.DataFrame({'Medicine_ID':["A","A","B","B","C","C","C","D","D","D"],'Counterfeit_Weight':[2,np.nan,2,np.nan,2,2,np.nan,1,1,2]})

解决方案

df1=df1.groupby('Medicine_ID')['Counterfeit_Weight'].apply(lambda x : x.mode()[0]).to_frame()
df=df.merge(df1,left_on='Medicine_ID',right_index=True)
df.Counterfeit_Weight_x.fillna(df.Counterfeit_Weight_y,inplace=True)
df.drop('Counterfeit_Weight_y',1).rename(columns={'Counterfeit_Weight_x':'Counterfeit_Weight'})

Out[360]: 
   Counterfeit_Weight Medicine_ID
0               999.0           A
1                 2.0           B
2                 2.0           C
3                 1.0           D

答案 1 :(得分:2)

要按每个组NaN的最常见值替换Medicine_ID,可以使用groupby transformfillna index的第一个值在value_counts之后:

df = pd.DataFrame({'A':list('abcdefabcdef'),
                   'Counterfeit_Weight':[np.nan,5.0,5.0,np.nan,2.0,4.1,3.0,
                                         np.nan,6.1,np.nan,4.1,4.1],
                   'Medicine_ID':list('caabbbaaabbb')})

print (df)
    A  Counterfeit_Weight Medicine_ID
0   a                 NaN           c
1   b                 5.0           a
2   c                 5.0           a
3   d                 NaN           b
4   e                 2.0           b
5   f                 4.1           b
6   a                 3.0           a
7   b                 NaN           a
8   c                 6.1           a
9   d                 NaN           b
10  e                 4.1           b
11  f                 4.1           b
f = lambda x: x.fillna(0 if x.isnull().all() else x.value_counts().index[0])
df['Counterfeit_Weight'] = (df.groupby('Medicine_ID')['Counterfeit_Weight']
                             .transform(f))
print (df)
    A  Counterfeit_Weight Medicine_ID
0   a                 0.0           c
1   b                 5.0           a
2   c                 5.0           a
3   d                 4.1           b
4   e                 2.0           b
5   f                 4.1           b
6   a                 3.0           a
7   b                 5.0           a
8   c                 6.1           a
9   d                 4.1           b
10  e                 4.1           b
11  f                 4.1           b