我正在尝试从头开始执行KNN算法,但我收到一个非常奇怪的错误,说“KeyError:0”
我认为这意味着我在某处有一个空字典,但我不明白这是怎么回事。为了清楚起见,我可能只是添加黑盒KNN算法中的数据工作正常,所以它必须是代码中的东西...
这是我的代码:
import numpy as np
import pandas as pd
import csv
import scipy.stats as stats
import math
from collections import Counter
import operator
from operator import itemgetter
"""Training features dataset"""
filenametrain_data = 'training_data.csv'
training_feature_set = pd.read_csv(filenametrain_data, header=None, usecols=range(1,13627))
"""Training labels dataset"""
filenametrain_label = 'training_labels.csv'
training_feature_label = pd.read_csv(filenametrain_label, header=None, usecols=[1], names=['Category'])
"""Split into training and testing datasets 90%/10%"""
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(training_feature_set, training_feature_label, test_size = 0.1, random_state=42)
"""KNN Model"""
def distance(X_train, y_train):
dist = 0.0
for i in range(len(X_train)):
dist += pow((X_train[i] - y_train[i]), 2)
return math.sqrt(dist)
def getNeighbors(X_train, y_train, X_test, k):
distances = []
for i in range(len(X_train)):
dist = distance(X_test, X_train[i])
distances.append((X_train[i], dist, y_train[i]))
distances.sort(key=operator.itemgetter(1))
neighbor = []
for elem in range(k):
neighbor.append((distances[elem][0], distances[elem][2]))
return neighbor
def getResponse(neighbors):
classVotes = {}
for x in range(len(neighbors)):
response = int(neighbors[x][-1])
if response in classVotes:
classVotes[response] += 1
else:
classVotes[response] = 1
sortedVotes = sorted(classVotes.items(), key=operator.itemgetter(1), reverse = True)
return sortedVotes[0][0]
"""Prediction"""
predictions = []
k = 4
for x in range(len(X_test)):
neighbors = getNeighbors(X_train, y_train, y_test[x], k)
result = getResponse(neighbors)
predictions.append(result)
返回的错误是:
追踪(最近一次呼叫最后一次):
文件“”,第2行,in neighbors = getNeighbors(X_train,y_train,y_test [x],k)
文件“C:\ ANACONDA \ lib \ site-packages \ pandas \ core \ frame.py”,行 1797年,在 getitem return self._getitem_column(key)
文件“C:\ ANACONDA \ lib \ site-packages \ pandas \ core \ frame.py”,行 1804,在_getitem_column中 return self._get_item_cache(key)
文件“C:\ ANACONDA \ lib \ site-packages \ pandas \ core \ generic.py”,行 1084,在_get_item_cache中 values = self._data.get(item)
文件“C:\ ANACONDA \ lib \ site-packages \ pandas \ core \ internals.py”,行 2851,在得到 loc = self.items.get_loc(item)
文件“C:\ ANACONDA \ lib \ site-packages \ pandas \ core \ index.py”,行 1572年,在get_loc中 return self._engine.get_loc(_values_from_object(key))
文件“pandas \ index.pyx”,第134行,in pandas.index.IndexEngine.get_loc(pandas \ index.c:3824)
文件“pandas \ index.pyx”,第154行,in pandas.index.IndexEngine.get_loc(pandas \ index.c:3704)
文件“pandas \ hashtable.pyx”,第686行,in pandas.hashtable.PyObjectHashTable.get_item(pandas \ hashtable.c:12280)
文件“pandas \ hashtable.pyx”,第694行,in pandas.hashtable.PyObjectHashTable.get_item(pandas \ hashtable.c:12231)
KeyError:0
可以访问数据集here
答案 0 :(得分:0)
编辑:您可能在csv文件的开头有一个额外的字符。尝试在read_csv()调用中指定编码。请参阅"编码"在http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
编码:str,默认无编码用于UTF时 读/写(例如'utf-8')。 Python标准编码列表: https://docs.python.org/3/library/codecs.html#standard-encodings
当你不需要一个圆点时,你可以使用一个圆点(在两个地方,我可以立即看到):
operator.itemgetter(1)
您已经专门导入了itemgetter:
from operator import itemgetter
因此,当您调用itemgetter时,只需在没有点表示法的情况下调用它:
itemgetter(1)