如何在PANDAS中对具有不同索引的数据帧或系列进行计算?

时间:2016-06-17 00:53:12

标签: python pandas dataframe series quandl

我有两个系列具有相同的长度和数据类型。两者都是float64。唯一的区别是索引都是日期,但是一个日期是在月初,另一个是在月末。如何在具有不同索引的系列或数据帧上进行相关或协方差等计算?

import numpy as np
from pandas import Series, DataFrame
import pandas as pd
import Quandl

IPO=Quandl.get("RITTER/US_IPO_STATS", authtoken="api key")
ir=Quandl.get("FRBC/REALRT", authtoken="api key")

ipo_splice=IPO[264:662]
new_ipo=ipo_splice['Gross Number of IPOs'];
new_ipo=new_ipo.T


ir_splice=ir[0:398]
new_ir=ir_splice['RR 1 Month']
new_ir=new_ir.T

new_ipo.corr(new_ir)

2 个答案:

答案 0 :(得分:0)

reset_index(drop=True)关于你要关联的事情,然后连续。

s1 = pd.DataFrame(np.random.rand(10), list('abcdefghij'), columns=['s1'])
s2 = pd.DataFrame(np.random.rand(10), list('ABCDEFGHIJ'), columns=['s2'])

print pd.concat([s.reset_index(drop=True) for s in [s1, s2]], axis=1).corr()


          s1        s2
s1  1.000000 -0.437945
s2 -0.437945  1.000000

答案 1 :(得分:0)

你可以使用resample()函数重新取样你的一个指数(我们的目标是指数BoM或EoM):

数据:

In [63]: df_bom
Out[63]:
            val
2015-01-01   76
2015-02-01   27
2015-03-01   65
2015-04-01   71
2015-05-01    9
2015-06-01   23
2015-07-01   52
2015-08-01   10
2015-09-01   62
2015-10-01   25

In [64]: df_eom
Out[64]:
            val
2015-01-31   87
2015-02-28   16
2015-03-31   85
2015-04-30    4
2015-05-31   37
2015-06-30   63
2015-07-31    3
2015-08-31   73
2015-09-30   81
2015-10-31   69

<强>解决方案:

In [61]: df_eom.resample('MS') + df_bom
C:\envs\py35\Scripts\ipython:1: FutureWarning: .resample() is now a deferred operation
use .resample(...).mean() instead of .resample(...)
Out[61]:
            val
2015-01-01  163
2015-02-01   43
2015-03-01  150
2015-04-01   75
2015-05-01   46
2015-06-01   86
2015-07-01   55
2015-08-01   83
2015-09-01  143
2015-10-01   94

In [62]: df_eom.resample('MS').join(df_bom, lsuffix='_lft')
C:\envs\py35\Scripts\ipython:1: FutureWarning: .resample() is now a deferred operation
use .resample(...).mean() instead of .resample(...)
Out[62]:
            val_lft  val
2015-01-01       87   76
2015-02-01       16   27
2015-03-01       85   65
2015-04-01        4   71
2015-05-01       37    9
2015-06-01       63   23
2015-07-01        3   52
2015-08-01       73   10
2015-09-01       81   62
2015-10-01       69   25

替代方法 - 按yearmonth部分合并DF:

In [69]: %paste
(pd.merge(df_bom, df_eom,
          left_on=[df_bom.index.year, df_bom.index.month],
          right_on=[df_eom.index.year, df_eom.index.month],
          suffixes=('_bom','_eom')))
## -- End pasted text --
Out[69]:
   key_0  key_1  val_bom  val_eom
0   2015      1       76       87
1   2015      2       27       16
2   2015      3       65       85
3   2015      4       71        4
4   2015      5        9       37
5   2015      6       23       63
6   2015      7       52        3
7   2015      8       10       73
8   2015      9       62       81
9   2015     10       25       69

<强>设定:

In [59]: df_bom = pd.DataFrame({'val':np.random.randint(0,100, 10)}, index=pd.date_range('2015-01-01', periods=10, freq='MS'))

In [60]: df_eom = pd.DataFrame({'val':np.random.randint(0,100, 10)}, index=pd.date_range('2015-01-01', periods=10, freq='M'))