我有一个包含多个列的数据框(df),其中一列是loan_status。该栏的细目如下:
AnyOneEvenFromUnsafeGroups
我想创建一个新列,将每个条目分类为具有良好信誉或不良信誉的贷款。我做了以下事情:
Charged Off 38029
Current 520784
Default 199
Does not meet the credit policy. Status:Charged Off 758
Does not meet the credit policy. Status:Current 15
Does not meet the credit policy. Status:Fully Paid 1947
Fully Paid 176203
In Grace Period 6486
Issued 142
Late (16-30 days) 2337
Late (31-120 days) 9949
这给了我以下错误:
ValueError:无法使用包含NA / NaN值的向量进行索引
这让我意识到数据中必须有一些纳米值,而且出于我的好奇心:
df["badloan"] = df.loan_status
df.loc[(df.badloan == "Charged Off") |
(df.badloan == "Default") |
(df.badloan =="Does not meet the credit policy. Status:Charged Off"),
df.badloan] = "Bad"
这导致了大多数真正的价值,但有些是假的。然后我尝试重新定义df.badloan,以防我意外修改它:
df["badloan"] == df.loan_status
这仍然产生了一些错误的价值。知道为什么吗?