我有两个数据集,一个在data1
中,另一个在data2
中。我使用
data = pandas.concat([data1, data2])
data1
和data2
相同,就像:
Loan_ID Gender Married Dependents Education Self_Employed \
0 LP001002 Male No 0 Graduate No
1 LP001003 Male Yes 1 Graduate No
2 LP001005 Male Yes 0 Graduate Yes
3 LP001006 Male Yes 0 Not Graduate No
ApplicantIncome CoapplicantIncome LoanAmount Loan_Amount_Term \
0 5849 0.0 NaN 360.0
1 4583 1508.0 128.0 360.0
2 3000 0.0 66.0 360.0
3 2583 2358.0 120.0 360.0
Credit_History Property_Area Loan_Status
0 1.0 Urban Y
1 1.0 Rural N
2 1.0 Urban Y
3 1.0 Urban Y `
我创建了一种方法:
def treat_outlier(x):
a,b = data[x].quantile([.25,.75])
IQR = b-a
for i in range(len(data[x])):
if(data[x][i]<a-(1.5*IQR) or (data[x][i]>b+(1.5*IQR))):
data[x][i] = data[x].median()
当我分别将此方法应用于data1
和data2
时,此函数有效,但是当我将其应用于合并数据集data
时,出现错误:>
系列的真值不明确。使用a.empty,a.bool(),a.item(),a.any()或a.all()
为什么会这样?