我正在尝试在两个不同的数据帧(相同大小)中关联同一列。 DFS使用带有datetimeindex的股票数据。我想出的每一种可能的相关性都只给出NaN的答案。 df的dtype是否确实搞砸了?注意:在程序的这一点上,我不在乎实际的日期/索引是什么。
输入:
import pandas as pd
pd.core.common.is_list_like = pd.api.types.is_list_like # temp fix
import numpy as np
import fix_yahoo_finance as yf
from pandas_datareader import data, wb
from datetime import date
df1 = yf.download('IBM', start = date (2000, 1, 3), end = date (2000, 1, 5), progress = False)
df2 = yf.download('IBM', start = date (2000, 1, 6), end = date (2000, 1, 10), progress = False)
print (df1)
print (df2)
print (df1['Open'].corr(df2['Open']))
输出:
Open High Low Close Adj Close Volume
Date
2000-01-03 112.4375 116.00 111.875 116.0000 81.096031 10347700
2000-01-04 114.0000 114.50 110.875 112.0625 78.343300 8227800
2000-01-05 112.9375 119.75 112.125 116.0000 81.096031 12733200
Open High Low Close Adj Close Volume
Date
2000-01-06 118.00 118.9375 113.500 114.0 79.697784 7971900
2000-01-07 117.25 117.9375 110.625 113.5 79.348267 11856700
2000-01-10 117.25 119.3750 115.375 118.0 82.494217 8540500
nan
答案 0 :(得分:0)
索引不匹配,这就是为什么我得到nan
的原因。在原始值上使用numpy.corrcoef
可获得结果:
np.corrcoef(df1['Open'].values,df2['Open'].values)
[[ 1. -0.74615579]
[-0.74615579 1. ]]