用于逻辑回归的Python类

时间:2020-03-25 04:13:17

标签: logistic-regression object-oriented-analysis python-class

我正在尝试用python写一个面向对象的类,它能够适合训练数据self.fit(X,y),以预测新数据上的类标签:self.predict(X),以预测每个标签的概率:self.predict_proba(X),以获取LogLoss的值:self.evaluate(X,y)。所有预处理步骤都必须包含在该类中。另外,单元测试。我的数据是与信用有关的信息,该班级正试图预测贷款违约情况。数据有一些分类列,我将它们转移到了虚拟变量中。 y变量为“ is_bad”(对于不良信用,为0,对于不良信用为1)。

在我的代码中,我首先定义具有x_train,x_test,y_train,y_test的父类Df。然后,我创建了一个具有logistic模型的子类Mod。但是,我不确定我对面向对象类的理解是否正确。怎么写这样的课?另外,我想知道什么是单元测试?如何编写逻辑回归的单元测试?

class Df():
    data = pd.read_csv("Lending_Club_DropNA.csv")
    lb_make = LabelEncoder()
    data['NU_home_ownership']=lb_make.fit_transform(data['home_ownership'])
    data['NU_verification_status']=lb_make.fit_transform(data['verification_status'])
    data['NU_purpose_cat'] = lb_make.fit_transform(data['purpose_cat'])
    data['NU_pymnt_plan'] = lb_make.fit_transform(data['pymnt_plan'])
    data['policy_code'] = lb_make.fit_transform(data['policy_code'])
    dat = data.drop(['home_ownership','verification_status','purpose_cat','pymnt_plan','policy_code','zip_code','addr_state','initial_list_status'],axis = 1)
    X,Y = dat.loc[:,dat.columns !="is_bad"],pd.DataFrame(dat.iloc[:,1])
    x_train, x_test, y_train, y_test = train_test_split(X,Y, test_size=0.3,random_state=123)
    sc = StandardScaler()
    x_train = sc.transform(x_train)
    x_test = sc.transform(x_test)
    class Mod(Df):
    L = LogisticRegression(random_state=0)
    def _init_(self, x_train, y_train,x_test,y_test):
        self.fit=L.fit(x_train,y_train)
        self.predit=L.predict(x_train)
        self.predict_proba=L.predict_proba(x_train)
        self.evaluate=log_loss(y_test,self.predict_proba(x_train))```

0 个答案:

没有答案