重组Dataframe

时间:2015-06-19 15:58:32

标签: python pandas

我有一个数据框,目前看起来如下,有262800行和3列。我的数据框目前如下:

       Currency    Maturity     value
0           GBP  0.08333333  4.709456
1           GBP  0.08333333  4.713099
2           GBP  0.08333333  4.707237
3           GBP  0.08333333  4.705043
4           GBP  0.08333333  4.697150
5           GBP  0.08333333  4.710647
6           GBP  0.08333333  4.701150
7           GBP  0.08333333  4.694639
8           GBP  0.08333333  4.686111
9           GBP  0.08333333  4.714750
......
262770      GBP          25  2.432869

我希望数据框的格式如下所示。我已经采取了一些措施,其中包括在下面的代码中使用melt,但出于某种原因,摆脱了我的Date列并导致上面的数据框。我不确定如何获取日期列并获得下面的数据框:

   Maturity     Date            Currency  Yield_pct
0  0.08333333   2005-01-04      GBP       4.709456              
1  0.08333333   2005-01-05      GBP       4.713099               
2  0.08333333   2005-01-06      GBP       4.707237
....
9  25           2005-01-04      GBP       2.432869

我的代码如下:

from pandas.io.excel import read_excel
import pandas as pd
import numpy as np

url = 'http://www.bankofengland.co.uk/statistics/Documents/yieldcurve/uknom05_mdaily.xls'

# check the sheet number, spot: 9/9, short end 7/9
spot_curve = read_excel(url, sheetname=8)
short_end_spot_curve = read_excel(url, sheetname=6)

# do some cleaning, keep NaN for now, as forward fill NaN is not recommended for yield curve
spot_curve.columns = spot_curve.loc['years:']
spot_curve.columns.name = 'Maturity'
valid_index = spot_curve.index[4:]
spot_curve = spot_curve.loc[valid_index]
# remove all maturities within 5 years as those are duplicated in short-end file
col_mask = spot_curve.columns.values > 5
spot_curve = spot_curve.iloc[:, col_mask]


short_end_spot_curve.columns = short_end_spot_curve.loc['years:']
short_end_spot_curve.columns.name = 'Maturity'
valid_index = short_end_spot_curve.index[4:]
short_end_spot_curve = short_end_spot_curve.loc[valid_index]

# merge these two, time index are identical
# ==============================================
combined_data = pd.concat([short_end_spot_curve, spot_curve], axis=1, join='outer')
# sort the maturity from short end to long end
combined_data.sort_index(axis=1, inplace=True)

def filter_func(group):
    return group.isnull().sum(axis=1) <= 50

combined_data = combined_data.groupby(level=0).filter(filter_func)

idx = 0
values = ['GBP'] * len(combined_data.index)
combined_data.insert(idx, 'Currency', values) 

#print combined_data.columns.values

#I had to do the melt 
combined_data = pd.melt(combined_data,id_vars=['Currency'])#Arbitrarily melted on 'Currency' as for some reason when I do print combined_data.columns.values I see that 'Currency' corresponds to 0.08333333, etc.
print combined_data

2 个答案:

答案 0 :(得分:2)

您是否可以在melt之后添加货币标识符?

# Copy up to this stage
combined_data = combined_data.groupby(level=0).filter(filter_func)

# My code from here
combined_data.reset_index(inplace=True, drop=False)
combined_data.rename(columns={'index': 'Date'}, inplace=True)

# This line assumes you want datetime, ignore if you don't
combined_data['Date'] = pd.to_datetime(combined_data['Date'])

result = pd.melt(combined_data, id_vars=['Date'])

result['Currency'] = 'GBP'

result.head()

的输出
    Date    Maturity    value   Currency
0   2005-01-04  0.08333333  4.709456    GBP
1   2005-01-05  0.08333333  4.713099    GBP
2   2005-01-06  0.08333333  4.707237    GBP
3   2005-01-07  0.08333333  4.705043    GBP
4   2005-01-10  0.08333333  4.697150    GBP

答案 1 :(得分:0)

首次重置索引后尝试堆叠结果以包含货币。

cd = combined_data.reset_index().set_index(['index', 'Currency'])
cd_new = cd.stack()
>>> cd_new
index       Currency  Maturity
2005-01-04  GBP       0.083333    4.709456
                      0.166667    4.633861
                      0.250000    4.586271
                      0.333333    4.567017
                      0.416667    4.559578
                      0.500000    4.553227
                      0.583333    4.543976
                      0.666667    4.530881
                      0.750000    4.514742
                      0.833333    4.497187
                      0.916667    4.479690
                      1.000000    4.463105
                      1.083333    4.447843
                      1.166667    4.434076
                      1.250000    4.421868
...
2015-05-29  GBP       18.0        2.453898
                      18.5        2.475052
                      19.0        2.494679
                      19.5        2.512787
                      20.0        2.529393
                      20.5        2.544519
                      21.0        2.558198
                      21.5        2.570467
                      22.0        2.581368
                      22.5        2.590947
                      23.0        2.599250
                      23.5        2.606327
                      24.0        2.612229
                      24.5        2.617008
                      25.0        2.620715
Length: 259457, dtype: float64

cd_new.xs('2015-05-29')
Currency  Maturity
GBP       0.333333    0.452339
          0.416667    0.441134
          0.500000    0.430168
          0.583333    0.419990
          0.666667    0.411208
          0.750000    0.404424
          0.833333    0.400017
          0.916667    0.398140
          1.000000    0.398806
          1.083333    0.401943
          1.166667    0.407427
          1.250000    0.415095
          1.333333    0.424762
          1.416667    0.436233
          1.500000    0.449322
...
GBP       18.0        2.453898
          18.5        2.475052
          19.0        2.494679
          19.5        2.512787
          20.0        2.529393
          20.5        2.544519
          21.0        2.558198
          21.5        2.570467
          22.0        2.581368
          22.5        2.590947
          23.0        2.599250
          23.5        2.606327
          24.0        2.612229
          24.5        2.617008
          25.0        2.620715
Length: 97, dtype: float64