使用fillna()在其他列中满足某些条件时如何在列中插入值

时间:2018-03-03 19:22:33

标签: python pandas ipython data-science

我在credit_history有NaN值时计算了计数。

Credit_History为NaN时的输出:

Self_Employed
Yes  532
No   32

Married
No   398
Yes  21

对于数值,我计算了所有列的平均值

当Credit_History为NaN时,

输出非数值:

Mean Applicant Income: 54003.1232
LoanAmount: 35435.12
Loan_Amount_Term: 360
ApplicantIncome: 30000

如何在这些情况下使用fillna():

案例1:当Self_Employed = Y且已婚= N时; Credit_History应为0

案例2:当Self_Employed = N且ApplicantIncome> 20000; Credit_History应为1

案例3:当Self_Employed = Y,Married = N且ApplicantIncome> 2000; Credit_History应为1

另外,当使用fillna()对某些条件不那么明显时,我们可以使用数据透视表来计算中值,然后使用fillna()来计算它们吗?

提前致谢。

1 个答案:

答案 0 :(得分:1)

使用numpy.select,如果所有条件均为default,则输出由参数from itertools import product c = ['Self_Employed','Married','ApplicantIncome'] df = pd.DataFrame(list(product(list('NY'), list('NY'), [10000, 30000])), columns=c) m1 = (df.Self_Employed == 'Y') & (df.Married == 'N') m2 = (df.Self_Employed == 'N') & (df.ApplicantIncome > 20000) m3 = m1 & (df.ApplicantIncome > 20000) df['Credit_History'] = np.select([m1, m2, m3], [0,1,1], default=2) print (df) Self_Employed Married ApplicantIncome Credit_History 0 N N 10000 2 1 N N 30000 1 2 N Y 10000 2 3 N Y 30000 1 4 Y N 10000 0 5 Y N 30000 0 6 Y Y 10000 2 7 Y Y 30000 2 定义:

c = ['Self_Employed','Married','ApplicantIncome']
df =  pd.DataFrame(list(product(list('NY'), list('NY'), [10000, 30000])), 
                   columns=c).assign(Credit_History=[np.nan,1,0, np.nan] *2)
print (df)
  Self_Employed Married  ApplicantIncome  Credit_History
0             N       N            10000             NaN
1             N       N            30000             1.0
2             N       Y            10000             0.0
3             N       Y            30000             NaN
4             Y       N            10000             NaN
5             Y       N            30000             1.0
6             Y       Y            10000             0.0
7             Y       Y            30000             NaN

m1 = (df.Self_Employed == 'Y') & (df.Married == 'N')
m2 = (df.Self_Employed == 'N') & (df.ApplicantIncome > 20000)
m3 = m1 & (df.ApplicantIncome > 20000)

s = pd.Series(np.select([m1, m2, m3], [0,1,1], default=2), index=df.index)
df['Credit_History'] = df['Credit_History'].fillna(s)
print (df)
  Self_Employed Married  ApplicantIncome  Credit_History
0             N       N            10000             2.0
1             N       N            30000             1.0
2             N       Y            10000             0.0
3             N       Y            30000             1.0
4             Y       N            10000             0.0
5             Y       N            30000             1.0
6             Y       Y            10000             0.0
7             Y       Y            30000             2.0

但如果想要通过条件替换添加fillna

UPDATE coins
SET coin_Row = (coin_SortOrder-1) % 3 + 1;