参考来自预测目标的数字标识符

时间:2018-01-24 16:00:54

标签: python pandas machine-learning scikit-learn

我构建了一个能够准确预测潜在欺诈客户的机器学习模型。我的数据集大约有12,000个观测值和大约43个特征(在我编写了几个特征后,大约有147个)。每行对应一个不同的客户,我有一个名为CUSTOMER_NUMBER的功能,其中包含每个客户的数字标识符(因此,大约有12,000个客户,大约有12,000个唯一数字标识符)。由于此功能是一个标识符,我不会将其包含在机器学习模型本身中,而是将其从原始数据框中删除(以及其他功能,如日期等)。

 df = df.drop(['CUSTOMER_NUMBER','TRANSACT_DT','CUSTOMER_NAME'],axis=1)

运行我的模型后

 #get this loss year: 
 #train loss year =2014-2016 
 #test loss year = 2017
 this_year = 'LY_' + str((pd.to_datetime('today').year)-1)


#partition train and test set by this LY
train = df.loc[df[this_year] == 0]
test = df.loc[df[this_year] == 1]



X_train =train.drop(['Target_Variable'],axis=1)
X_test =test.drop(['Target_Variable'],axis=1)
y_train = train['Target_Variable']
y_test = test['Target_Variable']
training_data = X_train,y_train
test_data = X_test,y_test


clf2= SVC(C=100,kernel='linear',class_weight='balanced')


#class_weight = class_weight.compute_class_weight('balanced', np.unique(y_train), y_train)
clf2.fit(X_train, y_train)
X = df.drop(['Target_Variable'],axis=1)


#define the predicted estimators
y_pred = clf2.predict(X_test) 

并评估我的真实阳性率,我能够预测df['Target_Variable'] ==1的某些行。然而,虽然我能够通过print(y_pred==1)获得预测1的列表,但如果我不知道哪个客户编号(我从我的数据框中删除)对应于它,那对我没有好处。因此,我想知道是否有人可以帮我弄清楚如何获得与y_pred==1对应的客户编号?或者我不应该删除功能df['CUSTOMER_NUMBER']并将其包含在我的模型中?如果是这样,它会改变我的预测结果,因为它只是一个唯一的数字标识符吗?

1 个答案:

答案 0 :(得分:0)

我同意第一条评论。你不应该包含'CUSTOMER_NUMBER'作为功能。由于每个数字都是唯一的,因此它不会为您的模型添加任何价值。只需创建一个功能子集

#include "memory"
#include <typeinfo>
#include <iostream>

class Shape
{
public:
    virtual ~Shape() = default;
    virtual void print() {}; // empty implementation, you could make it pure virtual as well
};
class Circle : public Shape { 
public:
    virtual void print() override {
        std::cout << "This object is a circle" << std::endl;
    }
};
class Square : public Shape { 
public:
    virtual void print() override {
        std::cout << "This object is a square" << std::endl;
    }
};

class Logging
{
public:
    static void print(std::shared_ptr<Shape> shape) {
        shape->print();
    }
};

int main() {
    //Shared Pointer Shape Declaration
    std::shared_ptr<Shape> circle = std::make_shared<Circle>();
    std::shared_ptr<Shape> square = std::make_shared<Square>();

    //Printing Shapes
    Logging::print(circle);
    Logging::print(square);

    return 0;
}

当你想知道'CUSTOMER_NUMBER'时

clf2.fit(X_train[features], y_train)