我正在按照本教程编写朴素贝叶斯分类器: http://machinelearningmastery.com/naive-bayes-classifier-scratch-python/
我一直收到这个错误:
dataset[i] = [float(x) for x in dataset[i]]
ValueError: could not convert string to float:
以下是我的代码中发生错误的部分:
def loadDatasetNB(filename):
lines = csv.reader(open(filename, "rt"))
dataset = list(lines)
for i in range(len(dataset)):
dataset[i] = [float(x) for x in dataset[i]]
return dataset
这是文件的调用方式:
def NB_Analysis():
filename = 'fvectors.csv'
splitRatio = 0.67
dataset = loadDatasetNB(filename)
trainingSet, testSet = splitDatasetNB(dataset, splitRatio)
print('Split {0} rows into train={1} and test={2} rows').format(len(dataset), len(trainingSet), len(testSet))
# prepare model
summaries = summarizeByClassNB(trainingSet)
# test model
predictions = getPredictionsNB(summaries, testSet)
accuracy = getAccuracyNB(testSet, predictionsNB)
print('Accuracy: {0}%').format(accuracy)
NB_Analysis()
My file fvectors.csv looks like this
这里出了什么问题,如何解决?
答案 0 :(得分:3)
尝试跳过标题,第一列中的空标题导致问题。
>>> float(' ')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: could not convert string to float:
如果您想跳过标题,可以通过以下方式实现:
def loadDatasetNB(filename):
lines = csv.reader(open(filename, "rt"))
next(reader, None) # <<- skip the headers
dataset = list(lines)
for i in range(len(dataset)):
dataset[i] = [float(x) for x in dataset[i]]
return dataset
(2)或者你可以忽略例外:
try:
float(element)
except ValueError:
pass
如果您决定使用选项(2),请确保仅跳过第一行或仅跳过包含文本的行,并且您肯定知道它。
答案 1 :(得分:1)
查看数据的图像,python无法使用值square
和circle
转换数据的最后一列。此外,您需要跳过数据中的标题。
尝试使用此代码:
def loadDatasetNB(filename):
with open(filename, 'r') as fp:
reader= csv.reader(fp)
# skip the header line
header = next(reader)
# save the features and the labels as different lists
data_features = []
data_labels = []
for row in reader:
# convert everything except the label to a float
data_features.append([float(x) for x in row[:-1]])
# save the labels separately
data_labels.append(row[-1])
return data_features, data_labels
答案 2 :(得分:0)
有一个空行。
>> float('')
ValueError: could not convert string to float:
您可以在投射前检查该值:
dataset[i] = [float(x) for x in dataset[i] if x != '']
答案 3 :(得分:0)
您正在将字符串加载到float
构造函数中,除非在特定条件下,否则会引发错误:
dataset[i] = [float(x) for x in dataset[i]]
不是使用列表推导,也许最好使用for循环,这样你就可以更轻松地处理这种情况:
data = []
for x in dataset[i]:
try:
value = float(x)
except ValueError:
value = x
data.append(value)
dataset[i] = data
在此处查看有关捕获例外的更多信息:
Try/Except in Python: How do you properly ignore Exceptions?