数据源: Credit Card Default Prediction Dataset
我想归一化(划分)用户过去6个月的账单金额和账单金额(总共12列,分别为:“ BILL_AMT1”,“ BILL_AMT2”,...“ BILL_AMT6”,“ PAY_AMT1”,.. 。,“ PAY_AMT6”)的信用额度“ LIMIT_BAL”
因此,例如,如果他在第一个月的帐单“ BILL_AMT1”为$ 7,000 USD,而他的总信用额度“ LIMIT_BAL”为$ 10,000 USD,我想使“ BILL_AMT1” = 0.7。我想对所有12列都执行此操作,从“ BILL_AMT1”一直到“ PAY_AMT6”
这是一种非常笨拙的方法,但是我相信有更好的方法可以做到这一点
import pandas as pd
import numpy as np
# Read and preview the data
df = pd.read_excel("C:\\JOE\\Data Science & Machine Learning\\PROJECT - CREDIT CARD DEFAULT PREDICTION\\default of credit card clients.xls", sheet_name="Data", header=1, index_col=0)
# list_to_normalize = ["BILL_AMT1", ... "PAY_AMT6"]
list_of_bill_amt = ["BILL_AMT"+str(i) for i in range(1,7)]
list_of_pay_amt = ["PAY_AMT"+str(i) for i in range(1,7)]
list_to_normalize = list_of_bill_amt + list_of_pay_amt
for item in list_to_normalize:
df[item] = df[item].div(df["LIMIT_BAL"], axis=0)
我知道所有12列都是连续的,所以为什么我不能做这样的事情?
df.loc["BILL_AMT1":"PAY_AMT6"] = df.loc["BILL_AMT1":"PAY_AMT6"].div(df.loc["LIMIT_BAL"], axis=0)
df.columns["BILL_AMT1":"PAY_AMT6"] = df.columns["BILL_AMT1":"PAY_AMT6"] / df.columns["LIMIT_BAL"]
非常感谢您的帮助!我真的很感激!