高斯分布更“ Pythonic”

时间:2019-02-14 20:59:33

标签: python

我是python编码的新手。我已经编写了一个python代码来计算高斯分布并预测值集的标签。这是我的课堂作业,我为此获得了良好的成绩。现在我想知道我的代码在更多的python方面是否正确。我是否可以对代码进行更多改进,使其更加精确和“ Pythonic”。

import math
import operator 
# Class to get the mean  and variance of all data point. Input paramaters 
#are the Labels (M or W) and parameter to calculate (height, weight, age). 
#samples are the number of data points
def getMean(trainingSet,parameter1,parameter2):
    mean =0 
    samples = 0 
    variance = 0
    for x in range(len(trainingSet)):
        if trainingSet[x][3]==parameter2:
            mean+= trainingSet[x][parameter1]
            samples = samples+1
    finalMean = mean/samples
    #print(finalMean)
    for x in range(len(trainingSet)):
        if trainingSet[x][3]==parameter2:
            variance+= (trainingSet[x][parameter1]-finalMean)**2
    finalVariance = variance/samples
    gausVal = []
    for x in range(len(trainingSet)):
        tempval = 
calculateGuassian(finalMean,finalVariance,trainingSet[x][parameter1])
        gausVal.append(tempval)
    return gausVal

#Class to calculate the gussaian distriubion points 

def calculateGuassian(meanVal, varianceVal, feature1):
    DenoVariance = 2*varianceVal
    func1 = 1/(math.sqrt(2*3.14*varianceVal))
    func2 = (-(feature1-meanVal)**2)/DenoVariance
    func3 = math.exp(func2)
    distro = func1*func3
    return distro

def finalProduct(multiplyer):
    result = 1
    for x in multiplyer: 
        result = result*x
    return result   

def arrayMultiply(arr1, arr2) :
    resultArray = []
    for x in range(len(arr1)):
        arrMul = arr1[x]*arr2[x]
        resultArray.append(arrMul)
    return resultArray  


# Main classes where every feature is calculated multiplied and the result 
#is shown

def main() :
    MenArr = []
    WomenList = []
    heightM = getMean(trainSet,0,'M')
    finalHM = finalProduct(heightM)
    MenArr.append(finalHM)
    heightW = getMean(trainSet,0, 'W')
    finalHW = finalProduct(heightW)
    WomenList.append(finalHW)
    weightM = getMean(trainSet,1,'M')
    finalWM = finalProduct(weightM)
    MenArr.append(finalWM)
    weightW = getMean(trainSet,1,'W')
    finalWW = finalProduct(weightW)
    WomenList.append(finalWW)
    ageM = getMean(trainSet,2,'M')
    finalAM = finalProduct(ageM)
    MenArr.append(finalAM)
    ageW = getMean(trainSet,2,'W')
    finalAW = finalProduct(ageW)
    WomenList.append(finalAW)
    BestResultMTemp = arrayMultiply(MenArr,testData)
    BestResultWTemp = arrayMultiply(WomenList,testData)
    BestResultM = finalProduct(BestResultMTemp)*0.50
    BestResultW = finalProduct(BestResultWTemp)*0.50
    print (BestResultM)
    print(BestResultW)
    if BestResultM<BestResultW :
        print("The Class Label Is W")
    if BestResultM>BestResultW :
            print("The Class Label Is M")


trainSet = [[170, 57, 32, 'W'],
[192, 95, 28, 'M'],
[150, 45, 30, 'W'],
[170, 65, 29, 'M'],
[175, 78, 35, 'M'],
[185, 90, 32, 'M'],
[170, 65, 28, 'W'],
[155, 48, 31, 'W'],
[160, 55, 30, 'W'],
[182, 80, 30, 'M'],
[175, 69, 28, 'W'],
[180, 80, 27, 'M'],
[160, 50, 31, 'W'],
[175, 72, 30, 'M']]     
testData = (175, 70, 35)        
main()

任何建议都是最欢迎的。预先谢谢你。

2 个答案:

答案 0 :(得分:1)

您的问题标题未反映您实际问的内容。您可能会更适合http://codereview.stackexchange.com/ ...

但是,乍看之下:

答案 1 :(得分:1)

如NichtJens所述,请尝试遵循PEP8指南,因此请使用带下划线的小写变量名称来分隔单词,并在字符之间添加更多空格。

还尝试使用一致且有意义的变量名。例如,为什么有MenArrWomenList?您有一个变量mean,但它的值是一个和。您有名称为func1func2等的临时变量。

要使您的for循环更多的Python循环遍历列表中的项目,而不是创建索引然后查找这些项目。

所以:

gauss_value = []
for x in range(len(lst)):
    value = calculate_guassian(mean, variance, lst[x][parameter1])
    gauss_value.append(value)
return gauss_value

您可以这样做:

gauss_value = []
for item in lst:
    value = calculate_guassian(mean, variance, item[parameter1])
    gauss_value.append(value)
return gauss_value

但更好的是,您可以使用列表理解:

gauss_value = [calculate_guassian(mean, variance, item[parameter1]) for item in lst]

您可以使用它来简化很多代码,例如arrayMultiply可能是:

def list_multiply(list_1, list_2) :
    return [a * b for a, b in zip(list_1, list_2)]

我的getMean版本将首先过滤数据。我不确定calculateGuassian部分使用未过滤的数据是否正确:

def get_mean(values, index, label):
    filtered_values = [value[index] for value in values if value[3] == label]
    n = len(filtered_values)

    mean = sum(filtered_values) / n
    summed_squared_difference = sum((val - mean) ** 2 for val in filtered_values)
    variance = summed_squared_difference / n

    return [calculateGuassian(mean, variance, item[index]) for item in values]

您还可以大大减少获取初始列表所需的代码量:

men_values = [product(get_mean(trainSet, i, 'M')) for i in range(3)]
women_values = [product(get_mean(trainSet, i, 'W')) for i in range(3)]

您可以通过使用以“ M”或“ W”作为参数并返回相关列表的函数来进一步减少重复。