我有一个DataFrame。它的两个专栏是' Medicine_ID'和' Counterfeit_Weight'。
对于' Medicine_ID'的每个值。 “假冒伪劣”中有NaN或固定值。柱。如何将此NaN值替换为' Medicine_ID'的特定值的相应固定值?
我的数据摘录:
train_data.loc [train_data [' Medicine_ID'] ==' IXN93']#为特定值' Medicine_ID'
答案 0 :(得分:2)
如果我的理解是正确的,您可以使用mode
数据输入
df=pd.DataFrame({'Medicine_ID':["A","B","C","D"],'Counterfeit_Weight':[999,2,np.nan,np.nan]})
df1=pd.DataFrame({'Medicine_ID':["A","A","B","B","C","C","C","D","D","D"],'Counterfeit_Weight':[2,np.nan,2,np.nan,2,2,np.nan,1,1,2]})
解决方案
df1=df1.groupby('Medicine_ID')['Counterfeit_Weight'].apply(lambda x : x.mode()[0]).to_frame()
df=df.merge(df1,left_on='Medicine_ID',right_index=True)
df.Counterfeit_Weight_x.fillna(df.Counterfeit_Weight_y,inplace=True)
df.drop('Counterfeit_Weight_y',1).rename(columns={'Counterfeit_Weight_x':'Counterfeit_Weight'})
Out[360]:
Counterfeit_Weight Medicine_ID
0 999.0 A
1 2.0 B
2 2.0 C
3 1.0 D
答案 1 :(得分:2)
要按每个组NaN
的最常见值替换Medicine_ID
,可以使用groupby
transform
和fillna
index
的第一个值在value_counts
之后:
df = pd.DataFrame({'A':list('abcdefabcdef'),
'Counterfeit_Weight':[np.nan,5.0,5.0,np.nan,2.0,4.1,3.0,
np.nan,6.1,np.nan,4.1,4.1],
'Medicine_ID':list('caabbbaaabbb')})
print (df)
A Counterfeit_Weight Medicine_ID
0 a NaN c
1 b 5.0 a
2 c 5.0 a
3 d NaN b
4 e 2.0 b
5 f 4.1 b
6 a 3.0 a
7 b NaN a
8 c 6.1 a
9 d NaN b
10 e 4.1 b
11 f 4.1 b
f = lambda x: x.fillna(0 if x.isnull().all() else x.value_counts().index[0])
df['Counterfeit_Weight'] = (df.groupby('Medicine_ID')['Counterfeit_Weight']
.transform(f))
print (df)
A Counterfeit_Weight Medicine_ID
0 a 0.0 c
1 b 5.0 a
2 c 5.0 a
3 d 4.1 b
4 e 2.0 b
5 f 4.1 b
6 a 3.0 a
7 b 5.0 a
8 c 6.1 a
9 d 4.1 b
10 e 4.1 b
11 f 4.1 b