滚动轮廓堆叠数据帧

时间:2018-02-10 20:23:38

标签: python-3.x pandas

我的数据框如下:

import pandas as pd
import datetime as dt

df= pd.DataFrame({'date':['2017-12-31','2017-12-31'],'type':['Asset','Liab'],'Amount':[100,-100],'Maturity Date':['2019-01-02','2018-01-01']})

df

我正在尝试通过检查“成熟日期”和“成熟日期”来构建滚降配置文件。超过日期'在将来。我正在努力实现以下目标:

#First Month
df1=df[df['Maturity Date']>'2018-01-31']
df1['date']='2018-01-31'

#Second Month
df2=df[df['Maturity Date']>'2018-02-28']
df2['date']='2018-02-28'

#third Month
df3=df[df['Maturity Date']>'2018-03-31']
df3['date']='2018-02-31'

#first quarter
qf1=df[df['Maturity Date']>'2018-06-30']
qf1['date']='2018-06-30'


#concatenate
df=pd.concat([df,df1,df2,df3,qf1])


df

我想知道是否有办法:

允许任意长的日期而不重复代码

2 个答案:

答案 0 :(得分:2)

你可以在Pandas武器库中使用一个漂亮的工具 pd.merge_asof。它 与pd.merge的工作方式类似,不同之处在于它匹配“最近”的键 比平等的钥匙。此外,您可以告诉pd.merge_asof寻找最近的地方 只按向后或向前键。

为了让事情变得有趣(并帮助检查事情是否正常),让我们在df添加另一行:

df = pd.DataFrame({'date':['2017-12-31', '2017-12-31'],'type':['Asset', 'Asset'],'Amount':[100,200],'Maturity Date':['2019-01-02', '2018-03-15']})
for col in ['date', 'Maturity Date']:
         df[col] = pd.to_datetime(df[col])
df = df.sort_values(by='Maturity Date')
print(df)
#    Amount Maturity Date       date   type
# 1     200    2018-03-15 2017-12-31  Asset
# 0     100    2019-01-02 2017-12-31  Asset

现在定义一些新日期:

dates = (pd.date_range('2018-01-31', periods=3, freq='M')
         .union(pd.date_range('2018-01-1', periods=2, freq='Q')))
result = pd.DataFrame({'date': dates})
#         date
# 0 2018-01-31
# 1 2018-02-28
# 2 2018-03-31
# 3 2018-06-30

现在我们可以合并行,将dates中最接近的result与来自Maturity Date的{​​{1}}匹配:

df

在这种情况下,我们希望将“result = pd.merge_asof(result, df.drop('date', axis=1), left_on='date', right_on='Maturity Date', direction='forward') date匹配” 所以我们使用Maturity Date

全部放在一起:

direction='forward'

产量

import pandas as pd

df = pd.DataFrame({'date':['2017-12-31', '2017-12-31'],'type':['Asset', 'Asset'],'Amount':[100,200],'Maturity Date':['2019-01-02', '2018-03-15']})
for col in ['date', 'Maturity Date']:
         df[col] = pd.to_datetime(df[col])
df = df.sort_values(by='Maturity Date')

dates = (pd.date_range('2018-01-31', periods=3, freq='M')
         .union(pd.date_range('2018-01-1', periods=2, freq='Q')))
result = pd.DataFrame({'date': dates})
result = pd.merge_asof(result, df.drop('date', axis=1), 
                       left_on='date', right_on='Maturity Date', direction='forward')

result = pd.concat([df, result], axis=0)
result = result.sort_values(by=['Maturity Date', 'date'])
print(result)

答案 1 :(得分:2)

我认为您需要numpy.tile来重复indices并分配到新列,最后按boolean indexing过滤并按sort_values排序:

d = '2017-12-31'
df['Maturity Date'] = pd.to_datetime(df['Maturity Date'])

#generate first month and next quarters
c1 = pd.date_range(d, periods=4, freq='M')
c2 = pd.date_range(c1[-1], periods=2, freq='Q')
#join together
c = c1.union(c2[1:])

#repeat rows be indexing repeated index
df1 = df.loc[np.tile(df.index, len(c))].copy()
#assign column by datetimes
df1['date'] = np.repeat(c, len(df))
#filter by boolean indexing
df1 = df1[df1['Maturity Date'] > df1['date']]
print (df1)
   Amount Maturity Date       date   type
0     100    2019-01-02 2017-12-31  Asset
1    -100    2018-01-01 2017-12-31   Liab
0     100    2019-01-02 2018-01-31  Asset
0     100    2019-01-02 2018-02-28  Asset
0     100    2019-01-02 2018-03-31  Asset
0     100    2019-01-02 2018-06-30  Asset