训练和测试以预测HealthScores

时间:2019-10-15 10:39:26

标签: python naivebayes sklearn-pandas

打印语句

def display(mess, values):
print()
print("-----", mess, "-----")
print(values)
print("------------------------")

图书馆的

import numpy as np
import pandas as pd
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split

从数据库加载

health_data = pd.read_csv("C:/Users/??/Downloads/Population(1).csv")

测试和培训(百分比)

health_train, health_test = train_test_split(health_data, test_size=0.1) 

正在接受培训和测试的数据库列

f_train = health_train[['Age', 'Weight  in lbs', 'Height in Inch',
                    'Units of alcohol per day', 'Cigarettes per day', 'Maritial Status Num', 
                    'Additional People in household', 'Salary', 'ActiveNum']].copy()
f_test = health_test[['Age', 'Weight  in lbs', 'Height in Inch', 
                  'Units of alcohol per day', 'Cigarettes per day', 'Maritial Status Num', 
                  'Additional People in household', 'Salary', 'ActiveNum']].copy()

s_train = health_train[['Health Score (high is good)']].copy()
s_test  = health_test[['Health Score (high is good)']].copy()

display("features", f_train)
display("Health Score (high is good)", s_train)

创建一个朴素贝叶斯分类器。按照惯例,olf的意思是“分类器”

clf = GaussianNB()

训练分类器以掌握训练功能并了解它们之间的关系

到训练y(物种)

clf.fit(f_train, s_train).predict(f_train)

#correct = 0
#wrong = 0
for index, row in health_test.iterrows():
prediction = clf.predict([row[['Age', 'Weight  in lbs', 'Height in Inch',
                               'Units of alcohol per day', 'Cigarettes per day', 'Maritial Status Num', 
                               'Additional People in household', 'Salary', 'ActiveNum']]])

print("Number of columns ", len(s_test.columns))
print("Number of rows", s_test.shape[0])

 #diff = abs(row['Health Score (high is good)'] - prediction)
  #if (diff < 10):
   #correct = correct + 1
    #else:
     #wrong = wrong + 1

#total = correct + wrong

#print("Correct ", correct, " wrong", wrong)
#print("Total   ", total,   " percentage right", (correct*100)/total,"%")

0 个答案:

没有答案