给出以下数据集:
from pandas import DataFrame
Data = {
'a1': [,0,,0.01,0,0.03,0.01,0.01,,0,0,0,0.01,0.01,0,0.01,0,0.01,0.01,0.01,0,,,,0,0,0.01,0.01,0.02,0.03,0],
'a2': [,,,,,,,,,,,,,,,,,,,,,,,0,0,,,0,,,],
'a3': [,0,0.02,,,0,0.01,0.03,0,0.01,0.01,0.02,,,,,,,,0.01,0.01,0,0.01,0.02,0,0,0.02,0,0,0,0.05],
'a4': [,0,0,,0,0,,0,,,,,,0,0,0,,0,0,0,0,0,0,0,0,,,,,,],
'a5': [,0,0,0,0,,0,,,,0,,,,,,,,,,,,,,,,,,,,],
'a6': [,0.01,0,0,0.01,0.01,0,0.01,0,0.01,0.01,0,0.01,0.01,0,0.01,0.01,0,0,0,0.01,0.01,0.03,0.01,0.01,0.01,0,0.01,0,0.01,0],
}
如何使用熊猫并考虑数组中某些维的空白/空值来创建相关矩阵? (那些应该被忽略)
我尝试将minPeriods减小为0。
df = DataFrame(Data,columns=['a1','a2','a3','a4','a5','a6'])
corrMatrix = df.corr(min_periods=0)
print (corrMatrix)
答案 0 :(得分:1)
您遇到的问题实际上是列类型。由于您使用空字符串表示缺失值,因此在熊猫创建数据框时,它将这些列视为字符串列(即对象),而不是数字。
因此,在计算相关度之前,需要转换列:
df = df.apply(pd.to_numeric)
然后,您可以进行相关调用:
df.corr(method='pearson')
请注意,min_perods=0
不能与pearson
相关。