KNN距离和班级投票

时间:2015-03-23 23:14:07

标签: python numpy knn

请告诉我如何正确计算testData中每个点之间的距离。

现在我只得到一个单一的值,而我应该从数据集中的每个点得到距离,并能够为它分配一个类。我必须使用numpy。

=============================================== ========================= 现在的问题是我收到了这个错误,并且不知道如何修复它。

KeyError: 0

我正在努力获得分类标签的准确性。 有什么想法吗?

import matplotlib.pyplot as plt
import random
import numpy as np
import operator
from sklearn.cross_validation import train_test_split
# In[1]
def readFile():
    f = open('iris.data', 'r')
    d = np.dtype([ ('features',np.float,(4,)),('class',np.str_,20)])
    data = np.genfromtxt(f, dtype = d ,delimiter=",")
    dataPoints = data['features']
    labels = data['class']
    return dataPoints, labels
# In[2]
def normalizeData(dataPoints):
    #normalize the data so the values will be between 0 and 1
    dataPointsNorm = (dataPoints - dataPoints.min())/(dataPoints.max() - dataPoints.min())
    return dataPointsNorm
def crossVal(dataPointsNorm):
    # spliting for train and test set for crossvalidation
    trainData, testData = train_test_split(dataPointsNorm, test_size=0.20, random_state=25)
    return trainData, testData

def calculateDistance(trainData, testData): 
    #Euclidean distance calculation on numpy arrays
    distance = np.sqrt(np.sum((trainData - testData)**2, axis=-1))
    # Argsort sorts indices from closest to furthest neighbor, in ascending order
    sortDistance = distance.argsort()
    return distance, sortDistance
# In[4]
def classifyKnn(testData, trainData, labels, k):
    # Calculating nearest neighbours and based on majority vote assigning the class
    classCount = {}
    for i in range(k):
        distance, sortedDistIndices = calculateDistance(trainData, testData[i])
        voteLabel = labels[sortedDistIndices][i]
        #print voteLabel
        classCount[voteLabel] = classCount.get(voteLabel,0)+1
        print 'Class Count: ', classCount
    # Sorting dictionary to return voted class
    sortedClassCount = sorted(classCount.iteritems(), key = operator.itemgetter(1), reverse=True)
    return sortedClassCount[0][0], classCount

def testAccuracy(testData, classCount):
    correct = 0
    for x in range(len(testData)):
         print 'HERE !!!!!!!!!!!!!!'
         if testData[x][-1] is classCount[x]:
            correct += 1
    return (correct/float(len(testData))) * 100.0
def main():    
    dataPoints, labels = readFile()
    dataPointsNorm = normalizeData(dataPoints)
    trainData, testData = crossVal(dataPointsNorm)
    result, classCount = classifyKnn(testData, trainData, labels, 5)
    print result
    accuracy = testAccuracy(testData, classCount)
    print accuracy

main()

我将它标准化,分成火车和测试钙距离(错误)。

感谢您的任何提示。

0 个答案:

没有答案