如何在SKLEARN中使用GaussianNB做出新的预测

时间:2019-02-06 19:13:14

标签: python scikit-learn gaussian naivebayes

我整天都被困住了。遵循this教程,他们在Sklearn上使用公开的收入数据集实施GaussianNB

我要弄清楚的是,一旦训练好模型,我就能即时做出预测。

数据集中的示例:

39, State-gov, 77516, Bachelors, 13, Never-married, Adm-clerical, Not-in-family, White, Male, 2174, 0, 40, United-States, <=50K
50, Self-emp-not-inc, 83311, Bachelors, 13, Married-civ-spouse, Exec-managerial, Husband, White, Male, 0, 0, 13, United-States, <=50K
38, Private, 215646, HS-grad, 9, Divorced, Handlers-cleaners, Not-in-family, White, Male, 0, 0, 40, United-States, <=50K
53, Private, 234721, 11th, 7, Married-civ-spouse, Handlers-cleaners, Husband, Black, Male, 0, 0, 40, United-States, <=50K

他们的代码:

# Required Python Machine learning Packages
import pandas as pd
import numpy as np
# For preprocessing the data
from sklearn.preprocessing import Imputer
from sklearn import preprocessing
# To split the dataset into train and test datasets
# To model the Gaussian Navie Bayes classifier
from sklearn.naive_bayes import GaussianNB
# To calculate the accuracy score of the model
from sklearn.metrics import accuracy_score
import os
from sklearn.model_selection import train_test_split

adult_df = pd.read_csv(os.path.join(os.path.dirname(__file__), "adult.data"),
                       header=None, delimiter=' *, *', engine='python')

adult_df.columns = ['age', 'workclass', 'fnlwgt', 'education', 'education_num',
                    'marital_status', 'occupation', 'relationship',
                    'race', 'sex', 'capital_gain', 'capital_loss',
                    'hours_per_week', 'native_country', 'income']

adult_df_rev = adult_df

adult_df_rev.describe(include='all')

for value in ['workclass', 'education',
              'marital_status', 'occupation',
              'relationship', 'race', 'sex',
              'native_country', 'income']:
    adult_df_rev[value].replace(['?'], [adult_df_rev.describe(include='all')[value][2]],
                                inplace=True)


le = preprocessing.LabelEncoder()
workclass_cat = le.fit_transform(adult_df.workclass)
education_cat = le.fit_transform(adult_df.education)
marital_cat = le.fit_transform(adult_df.marital_status)
occupation_cat = le.fit_transform(adult_df.occupation)
relationship_cat = le.fit_transform(adult_df.relationship)
race_cat = le.fit_transform(adult_df.race)
sex_cat = le.fit_transform(adult_df.sex)
native_country_cat = le.fit_transform(adult_df.native_country)

# initialize the encoded categorical columns
adult_df_rev['workclass_cat'] = workclass_cat
adult_df_rev['education_cat'] = education_cat
adult_df_rev['marital_cat'] = marital_cat
adult_df_rev['occupation_cat'] = occupation_cat
adult_df_rev['relationship_cat'] = relationship_cat
adult_df_rev['race_cat'] = race_cat
adult_df_rev['sex_cat'] = sex_cat
adult_df_rev['native_country_cat'] = native_country_cat

# drop the old categorical columns from dataframe
dummy_fields = ['workclass', 'education', 'marital_status',
                'occupation', 'relationship', 'race',
                'sex', 'native_country']
adult_df_rev = adult_df_rev.drop(dummy_fields, axis=1)

adult_df_rev = adult_df_rev.reindex_axis(['age', 'workclass_cat', 'fnlwgt', 'education_cat',
                                          'education_num', 'marital_cat', 'occupation_cat',
                                          'relationship_cat', 'race_cat', 'sex_cat', 'capital_gain',
                                          'capital_loss', 'hours_per_week', 'native_country_cat',
                                          'income'], axis=1)

adult_df_rev.head(1)

num_features = ['age', 'workclass_cat', 'fnlwgt', 'education_cat', 'education_num',
                'marital_cat', 'occupation_cat', 'relationship_cat', 'race_cat',
                'sex_cat', 'capital_gain', 'capital_loss', 'hours_per_week',
                'native_country_cat']

scaled_features = {}
for each in num_features:
    mean, std = adult_df_rev[each].mean(), adult_df_rev[each].std()
    scaled_features[each] = [mean, std]
    adult_df_rev.loc[:, each] = (adult_df_rev[each] - mean)/std

features = adult_df_rev.values[:, :14]
target = adult_df_rev.values[:, 14]
features_train, features_test, target_train, target_test = train_test_split(features,
                                                                            target, test_size=0.33, random_state=10)
clf = GaussianNB()
clf.fit(features_train, target_train)
target_pred = clf.predict(features_test)


print(target_pred)

print(accuracy_score(target_test, target_pred, normalize = True))

假设我现在从某处(39, State-gov, 77516, Bachelors, 13, Never-married, Adm-clerical, Not-in-family, White, Male, 2174, 0, 40, United-States, <=50K)获得以下数据,并且希望使用该模型进行预测,请您指导我如何执行此操作?

我尝试过 clf.predict([[39, "State-gov", 77516, "Bachelors", 13, "Never-married", "Adm-clerical", "Not-in-family", "White", "Male", 2174, 0, 40, "United-States", "<=50K"]]),但除了很多错误之外,什么也没有。

0 个答案:

没有答案