我有一个像这样的熊猫日期框架:
maturity coupon freq
0 2018-06-01 00:00:00 3 1
1 2017-10-01 00:00:00 2 1
我想要一个矩阵,在第一列中包含这些日期和日期1,2,......这些日期之前的年份,第二列包含从2016.03.04到日期的天数。
像这样:
date number of days remaining
2016-06-01 00:00:00 89
2016-10-01 00:00:00 211
2017-06-01 00:00:00 454
2017-10-01 00:00:00 576
2018-06-01 00:00:00 819
请帮忙!
答案 0 :(得分:1)
您可以尝试通过将DataFrame
添加到列出Series
然后DataOffset
来附加新dfs
来创建新的d
。最后,您可以减去日期时间Timedelta
,np.timedelta
由integer
转换为d = "2016.03.04"
#append substracted column maturity with DateOffset
dfs =[]
for i in range(5):
years_before = df['maturity'] - pd.DateOffset(years=i)
#get only datetime to date d
#print years_before.loc[years_before > d]
dfs.append(years_before.loc[years_before > d])
df = pd.DataFrame(pd.concat(dfs, ignore_index=True))
print df
maturity
0 2018-06-01
1 2017-10-01
2 2017-06-01
3 2016-10-01
4 2016-06-01
:
df['remain'] = (df['maturity'] - pd.to_datetime(d)) / np.timedelta64(1, 'D')
#sort values by column maturity
df = df.sort_values('maturity')
print df
maturity remain
4 2016-06-01 89
3 2016-10-01 211
2 2017-06-01 454
1 2017-10-01 576
0 2018-06-01 819
#get max count of years => loops
maxYears = (df['maturity'].max() - pd.to_datetime(d)) / np.timedelta64(1, 'D') / (365.25)
print maxYears
2.24229979466
#convert float to int, if 2.999 => 2, so one year is added
#rather add one more year (leap years, year is only estimated)
maxYears = int(maxYears) + 2
print maxYears
4
我尝试估计循环的最大计数(未经深度测试):
INSERT OVERWRITE TABLE All_three
SELECT SYMBOL, VOLUME FROM First
UNION
SELECT SYMBOL, VOLUME FROM Middle
UNION
SELECT SYMBOL, VOLUME FROM Third;