从Pandas DataFrame返回NaN值的相关矩阵

时间:2019-07-29 22:40:52

标签: python pandas dataframe nan correlation

我需要几个大型数据集来查找它们之间的相关性。数据被转换为熊猫数据框,我使用pd.DataFrame.corr()查找相关性。它适用于某些数据集而不适用于其他数据集,我不确定为什么。

不起作用的数据集中的值不相同,因此S.D不为0。 dataFrame对象的列类型(dtype)都是float64。

该代码适用于:

                               BPM1401-01:x  BPM1401-01:y
2019-07-23 05:59:59.641471863      0.000052     -0.000108  
2019-07-23 06:00:00.033471822      0.000050     -0.000108  
2019-07-23 06:00:00.425471783           NaN     -0.000108  
2019-07-23 06:00:00.816471815      0.000051           NaN  
2019-07-23 06:00:01.170471907      0.000050           NaN  
2019-07-23 06:00:01.954471827      0.000049           NaN  
2019-07-23 06:00:02.345471859      0.000051           NaN  
2019-07-23 06:00:02.737471819      0.000051     -0.000108  
2019-07-23 06:00:03.090471745      0.000052     -0.000108  
2019-07-23 06:00:03.481471777      0.000051     -0.000109  

但不适用于:

                               SR1:BPMXRMSGlobal  SR1:BPMYRMSGlobal
2019-07-23 05:59:58.197318077           1.096721                NaN  
2019-07-23 05:59:58.197477102                NaN           1.586067  
2019-07-23 06:00:01.471035957                NaN           0.772168  
2019-07-23 06:00:02.132909060           1.553643                NaN  
2019-07-23 06:00:02.132987022                NaN           1.209081  
2019-07-23 06:00:02.793922901           2.558707                NaN  
2019-07-23 06:00:02.793971062                NaN           1.624215  
2019-07-23 06:00:03.440277100           2.508732                NaN  
2019-07-23 06:00:03.440378904                NaN           1.540483  
2019-07-23 06:00:04.094022036           2.325517                NaN
import pandas as pd  
import seaborn as sb  
import numpy as np  

#Align the data using the timestamps, already done in the above sets.
def align_dataframes(data_frame_list):

    #Set progress to initial dataframe
    curr_df = data_frame_list[0]

    #Align all dataframes together and join
    for i in range(len(data_frame_list)-1):
        curr_df = curr_df.join(data_frame_list[i+1], how = 'outer')

    #Return aligned dataframe
    return(curr_df)

def plot_corr(data_frame):

    print(data_frame.dtypes) -> gives float64
    #Compute correlation matrix
    corr_mat = data_frame.corr(method = 'pearson',min_periods=1)
    heat_map = sb.heatmap(corr_mat, linewidths = .5)

    plt.show()

在我看来,第二个dataFrame应该也能正常工作,但是corr()矩阵最终返回NaN值。

1 个答案:

答案 0 :(得分:0)

第二个数据帧中没有两个值都不为null的行,因此没有要在其上找到相关性的数据点