我有市场和不同行业的月度回报。我想通过从行业回报中减去市场回报来计算行业超额回报。然后我想计算所有行业超额收益之间的相关性。
我有9系列的数据框(日期系列(每月)和8个月回报系列)。我想通过取系列3到9和系列2中的每一个来形成7个新系列(即系列10是系列3 - 系列2,系列11是系列3 - 系列2,依此类推)。新系列应与原始系列具有相同的标签,前缀为“Excess”。我能够一次完成一个系列,即df [“series10”] = df [“series3”] - df [“series2”]但是如何使用“for”语句和数字引用系列?另外,如何计算数据帧中两个系列的相关性。提前谢谢。
答案 0 :(得分:2)
您可以使用简单的联系人将所有这些联系人放入数据框中,然后使用forloop执行操作并根据您的要求创建新列
In [1]: import pandas as pd
# Creating a dummy data for illustration
In [2]: s_names = pd.Series(['a','b','c','d','f'], name = 'name')
In [3]: s1 = pd.Series([1,2,3,4,5], name = 's1')
In [4]: s2 = pd.Series([10,20,30,40,50], name='s2')
In [5]: s3 = pd.Series([100,200,300,400,500], name='s3')
# Use contact to create a new dataframe consisting of all seires
In [6]: data = pd.concat([s_names, s1, s2, s3], axis=1)
In [7]: data_columns = data.columns
In [8]: from itertools import combinations
# Generate a combination of columns for which you perform certain operations.
You can also have a custom list here.
In [9]: comb = list(combinations(data_columns[1:], 2))
In [10]: for c in comb:data[c[1]+"_"+c[0]] = data[c[1]] - data[c[0]]
In [11]: data
Out[11]:
name s1 s2 s3 s2_s1 s3_s1 s3_s2
0 a 1 10 100 9 99 90
1 b 2 20 200 18 198 180
2 c 3 30 300 27 297 270
3 d 4 40 400 36 396 360
4 f 5 50 500 45 495 450
对于相关性,您可以使用pandas.DataFrame.corr()
In [12]: data.corr()
Out[12]:
s1 s2 s3 s2_s1 s3_s1 s3_s2
s1 1 1 1 1 1 1
s2 1 1 1 1 1 1
s3 1 1 1 1 1 1
s2_s1 1 1 1 1 1 1
s3_s1 1 1 1 1 1 1
s3_s2 1 1 1 1 1 1
答案 1 :(得分:1)
提供避免循环的替代方案。
请参见在轴= 1
上使用diff()import pandas as pd
import numpy as np
rng = pd.date_range('2015-1-1',periods=12, freq='m')
data = pd.DataFrame(np.random.rand(12,8),index = rng)
data.index.name = 'month'
delta = data.diff(axis=1).iloc[:,1:]
delta.columns = ['Excess_' + str(col) for col in delta.columns]
data.join(delta)
0 1 2 3 4 5 6 7 Excess_1 Excess_2 Excess_3 Excess_4 Excess_5 Excess_6 Excess_7
month
2015-01-31 0.995529 0.528600 0.165824 0.903643 0.392386 0.997586 0.532741 0.465801 -0.466929 -0.362776 0.737819 -0.511257 0.605200 -0.464845 -0.066939
2015-02-28 0.105747 0.507735 0.264120 0.911261 0.961350 0.139388 0.756352 0.241203 0.401989 -0.243615 0.647140 0.050090 -0.821962 0.616964 -0.515149
2015-03-31 0.239546 0.537783 0.710753 0.317866 0.194260 0.774347 0.026830 0.652135 0.298237 0.172970 -0.392887 -0.123606 0.580087 -0.747517 0.625305
2015-04-30 0.453483 0.470196 0.340318 0.570760 0.163147 0.125921 0.074989 0.082275 0.016714 -0.129878 0.230442 -0.407613 -0.037226 -0.050933 0.007287
2015-05-31 0.099153 0.182511 0.676164 0.036362 0.026314 0.274792 0.961327 0.162986 0.083357 0.493653 -0.639801 -0.010049 0.248479 0.686534 -0.798341
2015-06-30 0.929498 0.401576 0.682311 0.831759 0.338765 0.147514 0.208116 0.358427 -0.527922 0.280735 0.149448 -0.492994 -0.191251 0.060603 0.150311
2015-07-31 0.030018 0.320987 0.031405 0.248800 0.988799 0.202371 0.882598 0.384514 0.290969 -0.289582 0.217395 0.739999 -0.786428 0.680226 -0.498083
2015-08-31 0.147542 0.672995 0.318547 0.279269 0.489103 0.808526 0.225413 0.004063 0.525453 -0.354447 -0.039278 0.209834 0.319423 -0.583114 -0.221349
2015-09-30 0.663309 0.784415 0.460139 0.792484 0.114094 0.731929 0.810777 0.381041 0.121106 -0.324276 0.332345 -0.678390 0.617835 0.078848 -0.429736
2015-10-31 0.638421 0.705389 0.022883 0.147137 0.876246 0.868816 0.902057 0.030144 0.066968 -0.682506 0.124254 0.729109 -0.007430 0.033241 -0.871913
2015-11-30 0.468480 0.888482 0.061717 0.352941 0.508728 0.905883 0.267931 0.680066 0.420003 -0.826766 0.291225 0.155786 0.397155 -0.637952 0.412135
2015-12-31 0.373209 0.891520 0.915866 0.979559 0.718712 0.421039 0.182262 0.460243 0.518311 0.024345 0.063693 -0.260847 -0.297673 -0.238777 0.277982
# if you want to subtract column 0, from column 1 to 7
# we will call that delta2
# I like to use the methods: add(), sub(), mul() etc.
# The key thing is that data[0] becomes a series and broadcasts across the frame, but the index labels on the row axis connect up.
#
delta2 = data.iloc[:,1:].sub(data[0],axis=0)
delta2.columns = ['Excess_' + str(col) for col in delta2.columns]
data.join(delta2)