Question

我有两个仅从一列中提取的熊猫数据框，并将dates列设置为索引，所以现在我有两个 Series 。我需要找到这些系列的相关性。

以下是dfd的几行：

index      change
2018-12-31  -0.86
2018-12-30  0.34
2018-12-27  -0.94
2018-12-26  -1.26
2018-12-25  3.30
2018-12-24  -4.17

并且来自dfp：

index      change
2018-12-31  0.55
2018-12-30  0.81
2018-12-27  -2.99
2018-12-26  0.50
2018-12-25  3.59
2018-12-24  -3.43

我尝试过：

correlation=dfp.corr(dfd)

，并出现以下错误：

TypeError: unsupported operand type(s) for /: 'str' and 'int'

Answer 1

问题是dfp由数字的字符串代表，所以请使用Series.astype转换为浮点数：

correlation=dfp.astype(float).corr(dfd.astype(float)
print (correlation)
0.8624789983270312

如果某些非数值解决方案再次失败，则将to_numeric与errors='coerce'结合使用-非数字将转换为缺失值：

correlation=pd.to_numeric(dfp, errors='coerce').corr(dfd)

Answer 2

可以合并两个数据框并关联列

dfd['date']=pd.to_datetime(dfd['date'])
dfd.set_index(dfd['date'], inplace=True)
dfd.drop(columns=['date'], inplace=True)

dfp['date']=pd.to_datetime(dfp['date'])
dfp.set_index(dfp['date'], inplace=True)
dfp.drop(columns=['date'], inplace=True)
df = pd.merge(dfp,dfd,left_index=True, right_index=True).reset_index()
df

在两列上关联change（dfd），（dfp）

df['change(dfp)'].corr(df['change(dfd)'])

结果

enter image description here

查找熊猫时间序列之间的相关性

2 个答案: