我正在开发一个项目,当给出几个数据列表时,它有许多功能。我已经把这些列表分开了,我已经定义了一些我知道正确工作的函数,它们是一个平均函数和标准差函数。我的问题是在测试我的列表时,我得到了正确的均值,正确的标准偏差,但错误的相关系数。我的数学可以在这里吗?我需要只用Python的标准库找到相关系数。
我的代码:
def correlCo(someList1, someList2):
# First establish the means and standard deviations for both lists.
xMean = mean(someList1)
yMean = mean(someList2)
xStandDev = standDev(someList1)
yStandDev = standDev(someList2)
zList1 = []
zList2 = []
# Create 2 new lists taking (a[i]-a's Mean)/standard deviation of a
for x in someList1:
z1 = ((float(x)-xMean)/xStandDev)
zList1.append(z1)
for y in someList2:
z2 = ((float(y)-yMean)/yStandDev)
zList2.append(z2)
# Mapping out the lists to be float values instead of string
zList1 = list(map(float,zList1))
zList2 = list(map(float,zList2))
# Multiplying each value from the lists
zFinal = [a*b for a,b in zip(zList1,zList2)]
totalZ = 0
# Taking the sum of all the products
for a in zFinal:
totalZ += a
# Finally calculating correlation coefficient
r = (1/(len(someList1) - 1)) * totalZ
return r
SAMPLE RUN:
我有一份[1,2,3,4,4,8]和[3,3,4,5,8,9]
的清单我期望r = 0.8848的正确答案,但得到r = .203727
编辑:包含我所做的均值和标准差函数。
def mean(someList):
total = 0
for a in someList:
total += float(a)
mean = total/len(someList)
return mean
def standDev(someList):
newList = []
sdTotal = 0
listMean = mean(someList)
for a in someList:
newNum = (float(a) - listMean)**2
newList.append(newNum)
for z in newList:
sdTotal += float(z)
standardDeviation = sdTotal/(len(newList))
return standardDeviation
答案 0 :(得分:2)
def mean(someList):
total = 0
for a in someList:
total += float(a)
mean = total/len(someList)
return mean
def standDev(someList):
listMean = mean(someList)
dev = 0.0
for i in range(len(someList)):
dev += (someList[i]-listMean)**2
dev = dev**(1/2.0)
return dev
def correlCo(someList1, someList2):
# First establish the means and standard deviations for both lists.
xMean = mean(someList1)
yMean = mean(someList2)
xStandDev = standDev(someList1)
yStandDev = standDev(someList2)
# r numerator
rNum = 0.0
for i in range(len(someList1)):
rNum += (someList1[i]-xMean)*(someList2[i]-yMean)
# r denominator
rDen = xStandDev * yStandDev
r = rNum/rDen
return r
print(correlCo([1,2,3,4,4,8], [3,3,4,5,8,9]))
0.884782972876
答案 1 :(得分:1)
Pearson相关性可以用numpy' corrcoef
来计算。
import numpy
numpy.corrcoef(list1, list2)[0, 1]
答案 2 :(得分:1)
您的标准偏差是错误的。你忘了带广场。 您实际上是返回方差而不是该函数的标准偏差。 @DeathPox
答案 3 :(得分:0)
通常根据标准偏差公式,你应该在sqrrt it.Right之前将dev除以样本数(列表的长度)。 我的意思是: dev + =((someList [i] -listMean)** 2)/ len(someList)