尝试使用sklearn

时间:2018-12-06 02:54:32

标签: python pandas numpy scikit-learn

我试图拟合为高斯NB项目编译的数据集。我的目标是查看给定的一组功能是否可以预测某个县的GINI指数(通常在0到1之间)以及准确度如何。我已经可视化了数据集,可以在我的Tableau Public网站-https://public.tableau.com/profile/sandeep.mohan#!/vizhome/RisingIncomeInequalityintheUSsince2010/Story1上查看它。它还提供了数据本身的上下文。

到目前为止,这是我的代码。如您所见,我删除了所有分类和非数字类(我只剩下一个整数和一个浮点数-GINI索引本身)。然后,我尝试拟合它,但返回值错误。因此,我尝试返回并明确指出目标是浮动对象。

我在做什么错?代码如下。预先感谢您的审核!

import numpy as np
import pandas as pd
import matplotlib as plt
%matplotlib inline
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

df = pd.read_csv("consdf_fin.csv")
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 22544 entries, 0 to 22543
Data columns (total 38 columns):
FIPS                       22544 non-null int64
County_Name                22544 non-null object
African_American_Male      22544 non-null int64
African_American_Female    22544 non-null int64
Native_Male                22544 non-null int64
Native_Female              22544 non-null int64
Asian_Male                 22544 non-null int64
Asian_Female               22544 non-null int64
Hispanic_Male              22544 non-null int64
Hispanic_Female            22544 non-null int64
summary_est                22544 non-null float64
year                       22544 non-null int64
Other_Male                 22544 non-null int64
Other_Female               22544 non-null int64
White_Male                 22544 non-null int64
White_Female               22544 non-null int64
GINI_Index                 22544 non-null float64
GINI_Low                   22544 non-null float64
GINI_High                  22544 non-null float64
Total_Occupied_Housing     22544 non-null int64
Occupied_Housing_Owner     22544 non-null int64
Occupied_Housing_Renter    22544 non-null int64
Black_MI                   22544 non-null int64
Asian_MI                   22544 non-null int64
White_MI                   22544 non-null int64
Hispanic_MI                22544 non-null int64
County_Median_Income       22544 non-null int64
Other_MI                   22544 non-null int64
Total                      22544 non-null int64
Employed_BS_or_Less        22544 non-null int64
Employed_BS_or_More        22544 non-null int64
Unemployed_BS_or_Less      22544 non-null int64
Unemployed_BS_or_More      22544 non-null int64
Total_Educ_Emp             22544 non-null int64
In_Poverty_Male            22544 non-null int64
In_Poverty_Female          22544 non-null int64
pct_in_poverty             22544 non-null float64
Total_Poverty              22544 non-null int64
dtypes: float64(5), int64(32), object(1)
memory usage: 6.5+ MB

df.drop(columns=['County_Name','summary_est', 'GINI_Low', 'GINI_High', 'pct_in_poverty'],inplace=True)

collist = df.columns.tolist()
print(collist)
len(collist)

df= df[['FIPS', 'African_American_Male', 'African_American_Female', 'Native_Male', 
        'Native_Female', 'Asian_Male', 'Asian_Female', 'Hispanic_Male', 'Hispanic_Female', 
        'year', 'Other_Male', 'Other_Female', 'White_Male', 'White_Female', 
        'Total_Occupied_Housing', 'Occupied_Housing_Owner', 'Occupied_Housing_Renter', 
        'Black_MI', 'Asian_MI', 'White_MI', 'Hispanic_MI', 'County_Median_Income', 'Other_MI', 'Total', 
        'Employed_BS_or_Less', 'Employed_BS_or_More', 'Unemployed_BS_or_Less', 'Unemployed_BS_or_More', 
        'Total_Educ_Emp', 'In_Poverty_Male', 'In_Poverty_Female', 'Total_Poverty','GINI_Index']]


features = df.values[:,0:31]
target = df.values[:,32]
features_train, features_test, target_train, target_test = train_test_split(features, target, test_size = 0.20, random_state = 10)

target_train = target_train.astype('float')
target_test = target_test.astype('float')

clf = GaussianNB()
clf.fit(features_train, target_train)
target_pred = clf.predict(features_test)
accuracy_score(target_test, target_pred)



ValueError                                Traceback (most recent call last)
<ipython-input-14-a3d0dedcdf18> in <module>()
      1 clf = GaussianNB()
----> 2 clf.fit(features_train, target_train)
      3 target_pred = clf.predict(features_test)
      4 accuracy_score(target_test, target_pred)

ValueError: Unknown label type: (array([0.2001, 0.207 , 0.304 , ..., 0.626 , 0.645 , 0.6519]),)

0 个答案:

没有答案