使用具有空值的熊猫的相关矩阵

时间:2020-03-05 15:13:01

标签: python pandas matrix correlation

给出以下数据集:

from pandas import DataFrame

Data = {
    'a1': [,0,,0.01,0,0.03,0.01,0.01,,0,0,0,0.01,0.01,0,0.01,0,0.01,0.01,0.01,0,,,,0,0,0.01,0.01,0.02,0.03,0],
    'a2': [,,,,,,,,,,,,,,,,,,,,,,,0,0,,,0,,,],
    'a3': [,0,0.02,,,0,0.01,0.03,0,0.01,0.01,0.02,,,,,,,,0.01,0.01,0,0.01,0.02,0,0,0.02,0,0,0,0.05],
    'a4': [,0,0,,0,0,,0,,,,,,0,0,0,,0,0,0,0,0,0,0,0,,,,,,],
    'a5': [,0,0,0,0,,0,,,,0,,,,,,,,,,,,,,,,,,,,],
    'a6': [,0.01,0,0,0.01,0.01,0,0.01,0,0.01,0.01,0,0.01,0.01,0,0.01,0.01,0,0,0,0.01,0.01,0.03,0.01,0.01,0.01,0,0.01,0,0.01,0],
    }

如何使用熊猫并考虑数组中某些维的空白/空值来创建相关矩阵? (那些应该被忽略)

我尝试将minPeriods减小为0。

df = DataFrame(Data,columns=['a1','a2','a3','a4','a5','a6'])

corrMatrix = df.corr(min_periods=0)
print (corrMatrix)

1 个答案:

答案 0 :(得分:1)

您遇到的问题实际上是列类型。由于您使用空字符串表示缺失值,因此在熊猫创建数据框时,它将这些列视为字符串列(即对象),而不是数字。

因此,在计算相关度之前,需要转换列:

df = df.apply(pd.to_numeric)

然后,您可以进行相关调用:

df.corr(method='pearson')

请注意,min_perods=0不能与pearson相关。