我正在尝试用python写一个面向对象的类,它能够适合训练数据self.fit(X,y),以预测新数据上的类标签:self.predict(X),以预测每个标签的概率:self.predict_proba(X),以获取LogLoss的值:self.evaluate(X,y)。所有预处理步骤都必须包含在该类中。另外,单元测试。我的数据是与信用有关的信息,该班级正试图预测贷款违约情况。数据有一些分类列,我将它们转移到了虚拟变量中。 y变量为“ is_bad”(对于不良信用,为0,对于不良信用为1)。
在我的代码中,我首先定义具有x_train,x_test,y_train,y_test的父类Df。然后,我创建了一个具有logistic模型的子类Mod。但是,我不确定我对面向对象类的理解是否正确。怎么写这样的课?另外,我想知道什么是单元测试?如何编写逻辑回归的单元测试?
class Df():
data = pd.read_csv("Lending_Club_DropNA.csv")
lb_make = LabelEncoder()
data['NU_home_ownership']=lb_make.fit_transform(data['home_ownership'])
data['NU_verification_status']=lb_make.fit_transform(data['verification_status'])
data['NU_purpose_cat'] = lb_make.fit_transform(data['purpose_cat'])
data['NU_pymnt_plan'] = lb_make.fit_transform(data['pymnt_plan'])
data['policy_code'] = lb_make.fit_transform(data['policy_code'])
dat = data.drop(['home_ownership','verification_status','purpose_cat','pymnt_plan','policy_code','zip_code','addr_state','initial_list_status'],axis = 1)
X,Y = dat.loc[:,dat.columns !="is_bad"],pd.DataFrame(dat.iloc[:,1])
x_train, x_test, y_train, y_test = train_test_split(X,Y, test_size=0.3,random_state=123)
sc = StandardScaler()
x_train = sc.transform(x_train)
x_test = sc.transform(x_test)
class Mod(Df):
L = LogisticRegression(random_state=0)
def _init_(self, x_train, y_train,x_test,y_test):
self.fit=L.fit(x_train,y_train)
self.predit=L.predict(x_train)
self.predict_proba=L.predict_proba(x_train)
self.evaluate=log_loss(y_test,self.predict_proba(x_train))```