我试图拟合为高斯NB项目编译的数据集。我的目标是查看给定的一组功能是否可以预测某个县的GINI指数(通常在0到1之间)以及准确度如何。我已经可视化了数据集,可以在我的Tableau Public网站-https://public.tableau.com/profile/sandeep.mohan#!/vizhome/RisingIncomeInequalityintheUSsince2010/Story1上查看它。它还提供了数据本身的上下文。
到目前为止,这是我的代码。如您所见,我删除了所有分类和非数字类(我只剩下一个整数和一个浮点数-GINI索引本身)。然后,我尝试拟合它,但返回值错误。因此,我尝试返回并明确指出目标是浮动对象。
我在做什么错?代码如下。预先感谢您的审核!
import numpy as np
import pandas as pd
import matplotlib as plt
%matplotlib inline
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
df = pd.read_csv("consdf_fin.csv")
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 22544 entries, 0 to 22543
Data columns (total 38 columns):
FIPS 22544 non-null int64
County_Name 22544 non-null object
African_American_Male 22544 non-null int64
African_American_Female 22544 non-null int64
Native_Male 22544 non-null int64
Native_Female 22544 non-null int64
Asian_Male 22544 non-null int64
Asian_Female 22544 non-null int64
Hispanic_Male 22544 non-null int64
Hispanic_Female 22544 non-null int64
summary_est 22544 non-null float64
year 22544 non-null int64
Other_Male 22544 non-null int64
Other_Female 22544 non-null int64
White_Male 22544 non-null int64
White_Female 22544 non-null int64
GINI_Index 22544 non-null float64
GINI_Low 22544 non-null float64
GINI_High 22544 non-null float64
Total_Occupied_Housing 22544 non-null int64
Occupied_Housing_Owner 22544 non-null int64
Occupied_Housing_Renter 22544 non-null int64
Black_MI 22544 non-null int64
Asian_MI 22544 non-null int64
White_MI 22544 non-null int64
Hispanic_MI 22544 non-null int64
County_Median_Income 22544 non-null int64
Other_MI 22544 non-null int64
Total 22544 non-null int64
Employed_BS_or_Less 22544 non-null int64
Employed_BS_or_More 22544 non-null int64
Unemployed_BS_or_Less 22544 non-null int64
Unemployed_BS_or_More 22544 non-null int64
Total_Educ_Emp 22544 non-null int64
In_Poverty_Male 22544 non-null int64
In_Poverty_Female 22544 non-null int64
pct_in_poverty 22544 non-null float64
Total_Poverty 22544 non-null int64
dtypes: float64(5), int64(32), object(1)
memory usage: 6.5+ MB
df.drop(columns=['County_Name','summary_est', 'GINI_Low', 'GINI_High', 'pct_in_poverty'],inplace=True)
collist = df.columns.tolist()
print(collist)
len(collist)
df= df[['FIPS', 'African_American_Male', 'African_American_Female', 'Native_Male',
'Native_Female', 'Asian_Male', 'Asian_Female', 'Hispanic_Male', 'Hispanic_Female',
'year', 'Other_Male', 'Other_Female', 'White_Male', 'White_Female',
'Total_Occupied_Housing', 'Occupied_Housing_Owner', 'Occupied_Housing_Renter',
'Black_MI', 'Asian_MI', 'White_MI', 'Hispanic_MI', 'County_Median_Income', 'Other_MI', 'Total',
'Employed_BS_or_Less', 'Employed_BS_or_More', 'Unemployed_BS_or_Less', 'Unemployed_BS_or_More',
'Total_Educ_Emp', 'In_Poverty_Male', 'In_Poverty_Female', 'Total_Poverty','GINI_Index']]
features = df.values[:,0:31]
target = df.values[:,32]
features_train, features_test, target_train, target_test = train_test_split(features, target, test_size = 0.20, random_state = 10)
target_train = target_train.astype('float')
target_test = target_test.astype('float')
clf = GaussianNB()
clf.fit(features_train, target_train)
target_pred = clf.predict(features_test)
accuracy_score(target_test, target_pred)
ValueError Traceback (most recent call last)
<ipython-input-14-a3d0dedcdf18> in <module>()
1 clf = GaussianNB()
----> 2 clf.fit(features_train, target_train)
3 target_pred = clf.predict(features_test)
4 accuracy_score(target_test, target_pred)
ValueError: Unknown label type: (array([0.2001, 0.207 , 0.304 , ..., 0.626 , 0.645 , 0.6519]),)