ValueError:在读取csv数据时无法将字符串转换为float:“ none”

时间:2019-03-18 11:57:39

标签: python pandas missing-data

我正在读取csv数据以建立模型。

我确实了解缺失值的处理方式,因此我已使用半径和零填充了它们。并删除了一些无关紧要的参数。

我手动检查了csv文件是否应用了empty值的过滤器。哪个字段为空,我试图填充它们。但是我仍然无法克服错误。

这是我的代码-

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

dataset = pd.read_csv("model__newdata.csv",header = 0)

#Data Pre-processing
data = dataset.drop('shift_location_id',1)
data = data.drop('status',1)
data = data.drop('city',1)
data = data.drop('open_positions',1)

#Find median for features having NaN
median_role_id, median_specialty_id = data['role_id'].median(),data['specialty_id'].median() 
median_shift_id = data['shift_id'].median()
median_shift_id = data['specialty_id'].median()

data['shift_id'].fillna(median_shift_id, inplace=True)
data['role_id'].fillna(median_role_id, inplace=True)
data['specialty_id'].fillna(median_specialty_id, inplace=True)
data['years_of_experience'].fillna(0, inplace=True)
data['specialty_id'].fillna(0, inplace=True)

#Start training

labels = dataset.shift_location_id
train1 = data
algo = LinearRegression()
x_train , x_test , y_train , y_test = train_test_split(train1 , labels , test_size = 0.20,random_state =1)

# x_train.to_csv("x_train.csv", sep=',', encoding='utf-8')
# x_test.to_csv("x_test.csv", sep=',', encoding='utf-8')

algo.fit(x_train,y_train)
algo.score(x_test,y_test)

错误:

ValueError                                Traceback (most recent call last)
<ipython-input-27-99f96096832a> in <module>
     32 # x_test.to_csv("x_test.csv", sep=',', encoding='utf-8')
     33 
---> 34 algo.fit(x_train,y_train)

ValueError: could not convert string to float: 'none'

任何建议如何解决这个问题?

编辑1- 样本数据-https://gist.githubusercontent.com/karimkhanvi/d69c98352aaaaed87f787a20c05307f8/raw/a45bb471fc1ee5095a1d0c3809a8362c001f639e/temp.csv

编辑2- 发布前,我已经检查了ValueError: could not convert string to float: id

如果您检查我是否没有遇到任何参数的数据类型,我将不胜感激。

ValueError:无法将字符串转换为浮点型:'none'

由于empty值,我遇到了问题。我已经尝试解决了无法解决我的问题的问题。这就是为什么我发布了这个问题。

编辑3 我试图检查是否有任何值isnull

data.isnull().values.any()
data.isnull().sum()

给出false

shift_id                 0
user_id                  0
shift_organization_id    0
shift_department_id      0
role_id                  0
specialty_id             0
years_of_experience      0
nurse_zip                0
shifts_zip               0
dtype: int64

0 个答案:

没有答案