我是python编码的新手。我已经编写了一个python代码来计算高斯分布并预测值集的标签。这是我的课堂作业,我为此获得了良好的成绩。现在我想知道我的代码在更多的python方面是否正确。我是否可以对代码进行更多改进,使其更加精确和“ Pythonic”。
import math
import operator
# Class to get the mean and variance of all data point. Input paramaters
#are the Labels (M or W) and parameter to calculate (height, weight, age).
#samples are the number of data points
def getMean(trainingSet,parameter1,parameter2):
mean =0
samples = 0
variance = 0
for x in range(len(trainingSet)):
if trainingSet[x][3]==parameter2:
mean+= trainingSet[x][parameter1]
samples = samples+1
finalMean = mean/samples
#print(finalMean)
for x in range(len(trainingSet)):
if trainingSet[x][3]==parameter2:
variance+= (trainingSet[x][parameter1]-finalMean)**2
finalVariance = variance/samples
gausVal = []
for x in range(len(trainingSet)):
tempval =
calculateGuassian(finalMean,finalVariance,trainingSet[x][parameter1])
gausVal.append(tempval)
return gausVal
#Class to calculate the gussaian distriubion points
def calculateGuassian(meanVal, varianceVal, feature1):
DenoVariance = 2*varianceVal
func1 = 1/(math.sqrt(2*3.14*varianceVal))
func2 = (-(feature1-meanVal)**2)/DenoVariance
func3 = math.exp(func2)
distro = func1*func3
return distro
def finalProduct(multiplyer):
result = 1
for x in multiplyer:
result = result*x
return result
def arrayMultiply(arr1, arr2) :
resultArray = []
for x in range(len(arr1)):
arrMul = arr1[x]*arr2[x]
resultArray.append(arrMul)
return resultArray
# Main classes where every feature is calculated multiplied and the result
#is shown
def main() :
MenArr = []
WomenList = []
heightM = getMean(trainSet,0,'M')
finalHM = finalProduct(heightM)
MenArr.append(finalHM)
heightW = getMean(trainSet,0, 'W')
finalHW = finalProduct(heightW)
WomenList.append(finalHW)
weightM = getMean(trainSet,1,'M')
finalWM = finalProduct(weightM)
MenArr.append(finalWM)
weightW = getMean(trainSet,1,'W')
finalWW = finalProduct(weightW)
WomenList.append(finalWW)
ageM = getMean(trainSet,2,'M')
finalAM = finalProduct(ageM)
MenArr.append(finalAM)
ageW = getMean(trainSet,2,'W')
finalAW = finalProduct(ageW)
WomenList.append(finalAW)
BestResultMTemp = arrayMultiply(MenArr,testData)
BestResultWTemp = arrayMultiply(WomenList,testData)
BestResultM = finalProduct(BestResultMTemp)*0.50
BestResultW = finalProduct(BestResultWTemp)*0.50
print (BestResultM)
print(BestResultW)
if BestResultM<BestResultW :
print("The Class Label Is W")
if BestResultM>BestResultW :
print("The Class Label Is M")
trainSet = [[170, 57, 32, 'W'],
[192, 95, 28, 'M'],
[150, 45, 30, 'W'],
[170, 65, 29, 'M'],
[175, 78, 35, 'M'],
[185, 90, 32, 'M'],
[170, 65, 28, 'W'],
[155, 48, 31, 'W'],
[160, 55, 30, 'W'],
[182, 80, 30, 'M'],
[175, 69, 28, 'W'],
[180, 80, 27, 'M'],
[160, 50, 31, 'W'],
[175, 72, 30, 'M']]
testData = (175, 70, 35)
main()
任何建议都是最欢迎的。预先谢谢你。
答案 0 :(得分:1)
您的问题标题未反映您实际问的内容。您可能会更适合http://codereview.stackexchange.com/ ...
但是,乍看之下:
阅读并遵守PEP8,它定义了良好的Python编码实践:https://www.python.org/dev/peps/pep-0008/
也许使用诸如autopep8(https://pypi.org/project/autopep8/)或flake8(http://flake8.pycqa.org/)之类的东西
查看如何使用if __name__ == "__main__":
:What does if __name__ == "__main__": do?
专门针对arrayMultiply()
函数之类的功能查看NumPy(http://www.numpy.org/)。
您在标题注释中将此称为“类”。但是,这里没有定义任何类。因此,研究一下Python类实际上是什么:https://docs.python.org/3/tutorial/classes.html
答案 1 :(得分:1)
如NichtJens所述,请尝试遵循PEP8指南,因此请使用带下划线的小写变量名称来分隔单词,并在字符之间添加更多空格。
还尝试使用一致且有意义的变量名。例如,为什么有MenArr
和WomenList
?您有一个变量mean
,但它的值是一个和。您有名称为func1
,func2
等的临时变量。
要使您的for
循环更多的Python循环遍历列表中的项目,而不是创建索引然后查找这些项目。
所以:
gauss_value = []
for x in range(len(lst)):
value = calculate_guassian(mean, variance, lst[x][parameter1])
gauss_value.append(value)
return gauss_value
您可以这样做:
gauss_value = []
for item in lst:
value = calculate_guassian(mean, variance, item[parameter1])
gauss_value.append(value)
return gauss_value
但更好的是,您可以使用列表理解:
gauss_value = [calculate_guassian(mean, variance, item[parameter1]) for item in lst]
您可以使用它来简化很多代码,例如arrayMultiply
可能是:
def list_multiply(list_1, list_2) :
return [a * b for a, b in zip(list_1, list_2)]
我的getMean
版本将首先过滤数据。我不确定calculateGuassian
部分使用未过滤的数据是否正确:
def get_mean(values, index, label):
filtered_values = [value[index] for value in values if value[3] == label]
n = len(filtered_values)
mean = sum(filtered_values) / n
summed_squared_difference = sum((val - mean) ** 2 for val in filtered_values)
variance = summed_squared_difference / n
return [calculateGuassian(mean, variance, item[index]) for item in values]
您还可以大大减少获取初始列表所需的代码量:
men_values = [product(get_mean(trainSet, i, 'M')) for i in range(3)]
women_values = [product(get_mean(trainSet, i, 'W')) for i in range(3)]
您可以通过使用以“ M”或“ W”作为参数并返回相关列表的函数来进一步减少重复。