我有一个数据框,目前看起来如下,有262800行和3列。我的数据框目前如下:
Currency Maturity value
0 GBP 0.08333333 4.709456
1 GBP 0.08333333 4.713099
2 GBP 0.08333333 4.707237
3 GBP 0.08333333 4.705043
4 GBP 0.08333333 4.697150
5 GBP 0.08333333 4.710647
6 GBP 0.08333333 4.701150
7 GBP 0.08333333 4.694639
8 GBP 0.08333333 4.686111
9 GBP 0.08333333 4.714750
......
262770 GBP 25 2.432869
我希望数据框的格式如下所示。我已经采取了一些措施,其中包括在下面的代码中使用melt
,但出于某种原因,摆脱了我的Date
列并导致上面的数据框。我不确定如何获取日期列并获得下面的数据框:
Maturity Date Currency Yield_pct
0 0.08333333 2005-01-04 GBP 4.709456
1 0.08333333 2005-01-05 GBP 4.713099
2 0.08333333 2005-01-06 GBP 4.707237
....
9 25 2005-01-04 GBP 2.432869
我的代码如下:
from pandas.io.excel import read_excel
import pandas as pd
import numpy as np
url = 'http://www.bankofengland.co.uk/statistics/Documents/yieldcurve/uknom05_mdaily.xls'
# check the sheet number, spot: 9/9, short end 7/9
spot_curve = read_excel(url, sheetname=8)
short_end_spot_curve = read_excel(url, sheetname=6)
# do some cleaning, keep NaN for now, as forward fill NaN is not recommended for yield curve
spot_curve.columns = spot_curve.loc['years:']
spot_curve.columns.name = 'Maturity'
valid_index = spot_curve.index[4:]
spot_curve = spot_curve.loc[valid_index]
# remove all maturities within 5 years as those are duplicated in short-end file
col_mask = spot_curve.columns.values > 5
spot_curve = spot_curve.iloc[:, col_mask]
short_end_spot_curve.columns = short_end_spot_curve.loc['years:']
short_end_spot_curve.columns.name = 'Maturity'
valid_index = short_end_spot_curve.index[4:]
short_end_spot_curve = short_end_spot_curve.loc[valid_index]
# merge these two, time index are identical
# ==============================================
combined_data = pd.concat([short_end_spot_curve, spot_curve], axis=1, join='outer')
# sort the maturity from short end to long end
combined_data.sort_index(axis=1, inplace=True)
def filter_func(group):
return group.isnull().sum(axis=1) <= 50
combined_data = combined_data.groupby(level=0).filter(filter_func)
idx = 0
values = ['GBP'] * len(combined_data.index)
combined_data.insert(idx, 'Currency', values)
#print combined_data.columns.values
#I had to do the melt
combined_data = pd.melt(combined_data,id_vars=['Currency'])#Arbitrarily melted on 'Currency' as for some reason when I do print combined_data.columns.values I see that 'Currency' corresponds to 0.08333333, etc.
print combined_data
答案 0 :(得分:2)
您是否可以在melt
之后添加货币标识符?
# Copy up to this stage
combined_data = combined_data.groupby(level=0).filter(filter_func)
# My code from here
combined_data.reset_index(inplace=True, drop=False)
combined_data.rename(columns={'index': 'Date'}, inplace=True)
# This line assumes you want datetime, ignore if you don't
combined_data['Date'] = pd.to_datetime(combined_data['Date'])
result = pd.melt(combined_data, id_vars=['Date'])
result['Currency'] = 'GBP'
result.head()
Date Maturity value Currency
0 2005-01-04 0.08333333 4.709456 GBP
1 2005-01-05 0.08333333 4.713099 GBP
2 2005-01-06 0.08333333 4.707237 GBP
3 2005-01-07 0.08333333 4.705043 GBP
4 2005-01-10 0.08333333 4.697150 GBP
答案 1 :(得分:0)
首次重置索引后尝试堆叠结果以包含货币。
cd = combined_data.reset_index().set_index(['index', 'Currency'])
cd_new = cd.stack()
>>> cd_new
index Currency Maturity
2005-01-04 GBP 0.083333 4.709456
0.166667 4.633861
0.250000 4.586271
0.333333 4.567017
0.416667 4.559578
0.500000 4.553227
0.583333 4.543976
0.666667 4.530881
0.750000 4.514742
0.833333 4.497187
0.916667 4.479690
1.000000 4.463105
1.083333 4.447843
1.166667 4.434076
1.250000 4.421868
...
2015-05-29 GBP 18.0 2.453898
18.5 2.475052
19.0 2.494679
19.5 2.512787
20.0 2.529393
20.5 2.544519
21.0 2.558198
21.5 2.570467
22.0 2.581368
22.5 2.590947
23.0 2.599250
23.5 2.606327
24.0 2.612229
24.5 2.617008
25.0 2.620715
Length: 259457, dtype: float64
cd_new.xs('2015-05-29')
Currency Maturity
GBP 0.333333 0.452339
0.416667 0.441134
0.500000 0.430168
0.583333 0.419990
0.666667 0.411208
0.750000 0.404424
0.833333 0.400017
0.916667 0.398140
1.000000 0.398806
1.083333 0.401943
1.166667 0.407427
1.250000 0.415095
1.333333 0.424762
1.416667 0.436233
1.500000 0.449322
...
GBP 18.0 2.453898
18.5 2.475052
19.0 2.494679
19.5 2.512787
20.0 2.529393
20.5 2.544519
21.0 2.558198
21.5 2.570467
22.0 2.581368
22.5 2.590947
23.0 2.599250
23.5 2.606327
24.0 2.612229
24.5 2.617008
25.0 2.620715
Length: 97, dtype: float64