使用数据框中的系列组合创建新系列

时间:2015-12-17 16:06:41

标签: python pandas time-series series

我有市场和不同行业的月度回报。我想通过从行业回报中减去市场回报来计算行业超额回报。然后我想计算所有行业超额收益之间的相关性。

我有9系列的数据框(日期系列(每月)和8个月回报系列)。我想通过取系列3到9和系列2中的每一个来形成7个新系列(即系列10是系列3 - 系列2,系列11是系列3 - 系列2,依此类推)。新系列应与原始系列具有相同的标签,前缀为“Excess”。我能够一次完成一个系列,即df [“series10”] = df [“series3”] - df [“series2”]但是如何使用“for”语句和数字引用系列?另外,如何计算数据帧中两个系列的相关性。提前谢谢。

2 个答案:

答案 0 :(得分:2)

您可以使用简单的联系人将所有这些联系人放入数据框中,然后使用forloop执行操作并根据您的要求创建新列

In [1]: import pandas as pd
# Creating a dummy data for illustration
In [2]: s_names =  pd.Series(['a','b','c','d','f'], name = 'name')

In [3]: s1 = pd.Series([1,2,3,4,5], name = 's1')

In [4]: s2 = pd.Series([10,20,30,40,50], name='s2')

In [5]: s3 = pd.Series([100,200,300,400,500], name='s3')

# Use contact to create a new dataframe consisting of all seires
In [6]: data = pd.concat([s_names, s1, s2, s3], axis=1)

In [7]: data_columns = data.columns

In [8]: from itertools import combinations 
# Generate a combination of columns for which you perform certain operations. 
You can also have a custom list here. 

In [9]: comb = list(combinations(data_columns[1:], 2))

In [10]: for c in comb:data[c[1]+"_"+c[0]] = data[c[1]] - data[c[0]]

In [11]: data
Out[11]: 
  name  s1  s2   s3  s2_s1  s3_s1  s3_s2
0    a   1  10  100      9     99     90
1    b   2  20  200     18    198    180
2    c   3  30  300     27    297    270
3    d   4  40  400     36    396    360
4    f   5  50  500     45    495    450

对于相关性,您可以使用pandas.DataFrame.corr()

In [12]: data.corr()
Out[12]: 
       s1  s2  s3  s2_s1  s3_s1  s3_s2
s1      1   1   1      1      1      1
s2      1   1   1      1      1      1
s3      1   1   1      1      1      1
s2_s1   1   1   1      1      1      1
s3_s1   1   1   1      1      1      1
s3_s2   1   1   1      1      1      1

答案 1 :(得分:1)

提供避免循环的替代方案。

请参见在轴= 1

上使用diff()
import pandas as pd
import numpy as np
rng = pd.date_range('2015-1-1',periods=12, freq='m')
data = pd.DataFrame(np.random.rand(12,8),index = rng)
data.index.name = 'month'
delta = data.diff(axis=1).iloc[:,1:]
delta.columns = ['Excess_' + str(col) for col in  delta.columns]
data.join(delta)

            0           1           2           3           4           5           6          7          Excess_1      Excess_2    Excess_3    Excess_4    Excess_5    Excess_6    Excess_7
month                                                           
2015-01-31  0.995529    0.528600    0.165824    0.903643    0.392386    0.997586    0.532741    0.465801    -0.466929   -0.362776   0.737819    -0.511257   0.605200    -0.464845   -0.066939
2015-02-28  0.105747    0.507735    0.264120    0.911261    0.961350    0.139388    0.756352    0.241203    0.401989    -0.243615   0.647140    0.050090    -0.821962   0.616964    -0.515149
2015-03-31  0.239546    0.537783    0.710753    0.317866    0.194260    0.774347    0.026830    0.652135    0.298237    0.172970    -0.392887   -0.123606   0.580087    -0.747517   0.625305
2015-04-30  0.453483    0.470196    0.340318    0.570760    0.163147    0.125921    0.074989    0.082275    0.016714    -0.129878   0.230442    -0.407613   -0.037226   -0.050933   0.007287
2015-05-31  0.099153    0.182511    0.676164    0.036362    0.026314    0.274792    0.961327    0.162986    0.083357    0.493653    -0.639801   -0.010049   0.248479    0.686534    -0.798341
2015-06-30  0.929498    0.401576    0.682311    0.831759    0.338765    0.147514    0.208116    0.358427    -0.527922   0.280735    0.149448    -0.492994   -0.191251   0.060603    0.150311
2015-07-31  0.030018    0.320987    0.031405    0.248800    0.988799    0.202371    0.882598    0.384514    0.290969    -0.289582   0.217395    0.739999    -0.786428   0.680226    -0.498083
2015-08-31  0.147542    0.672995    0.318547    0.279269    0.489103    0.808526    0.225413    0.004063    0.525453    -0.354447   -0.039278   0.209834    0.319423    -0.583114   -0.221349
2015-09-30  0.663309    0.784415    0.460139    0.792484    0.114094    0.731929    0.810777    0.381041    0.121106    -0.324276   0.332345    -0.678390   0.617835    0.078848    -0.429736
2015-10-31  0.638421    0.705389    0.022883    0.147137    0.876246    0.868816    0.902057    0.030144    0.066968    -0.682506   0.124254    0.729109    -0.007430   0.033241    -0.871913
2015-11-30  0.468480    0.888482    0.061717    0.352941    0.508728    0.905883    0.267931    0.680066    0.420003    -0.826766   0.291225    0.155786    0.397155    -0.637952   0.412135
2015-12-31  0.373209    0.891520    0.915866    0.979559    0.718712    0.421039    0.182262    0.460243    0.518311    0.024345    0.063693    -0.260847   -0.297673   -0.238777   0.277982


# if you want to subtract column 0, from column 1 to 7
# we will call that delta2
# I like to use the methods: add(), sub(), mul() etc.
# The key thing is that data[0] becomes a series and broadcasts across the frame, but the index labels on the row axis connect up.
#
delta2 = data.iloc[:,1:].sub(data[0],axis=0)
delta2.columns = ['Excess_' + str(col) for col in  delta2.columns]
data.join(delta2)