如果不应用循环就无法解决此问题,并且我有很长的时间序列数据。我想根据今天掌握的信息知道最接近的下一次到期日。下面的示例:请注意,下一个到期日期应针对该特定代码。必须有一种更Python化的方式来做到这一点。
date matdate code
2-Jan-2018 5-Jan-2018 A
3-Jan-2018 6-Jan-2018 A
8-Jan-2018 12-Jan-2018 B
10-Jan-2018 15-Jan-2018 A
11-Jan-2018 16-Jan-2018 B
15-Jan-2018 17-Jan-2018 A
我正在寻找以下格式的输出-该格式将输出中所有工作日的日期(下面也可以采用数据透视格式,但应将所有工作日的日期作为索引)
date matdate code BusinessDaysToNextMat
2-Jan-2018 5-Jan-2018 A 3
2-Jan 2018 B 0
3-Jan-2018 8-Jan-2018 A 2
3-Jan-2018 B 0
4-Jan-2018 A 1
4-Jan-2018 B 0
5-Jan-2018 A 0
5-Jan-2018 B 0
8-Jan-2018 A 0
8-Jan-2018 17-Jan-2018 B 7
9-Jan-2018 A 0
9-Jan-2018 B 6
10-Jan-2018 16-Jan-2018 A 4
10-Jan-2018 B 6
11-Jan-2018 A 3
11-Jan-2018 16-Jan-2018 B 3
12-Jan-2018 A 4
12-Jan-2018 B 2
15-Jan-2018 17-Jan-2018 A 1
15-Jan-2018 B 1
非常感谢您的光临!
答案 0 :(得分:0)
您可以使用numpy.busday_count来实现: 将numpy导入为np
df['BusinessDaysToNextMat'] = df[['date', 'matdate']].apply(lambda x: np.busday_count(*x), axis=1)
df
# date matdate code BusinessDaysToNextMat
#0 2018-01-01 2018-01-05 A 4
#1 2018-01-03 2018-01-06 A 3
#2 2018-01-08 2018-01-12 B 4
#3 2018-01-10 2018-01-15 A 3
#4 2018-01-11 2018-01-16 B 3
#5 2018-01-15 2018-01-17 A 2
#6 2018-01-20 2018-01-22 A 0
这似乎不完全是您在示例中所拥有的,但是最多:
index = pd.MultiIndex.from_product(
[pd.date_range(
df['date'].min(),
df['date'].max(), freq='C').values,
df['code'].unique()],
names = ['date', 'code'])
resampled = pd.DataFrame(index=index).reset_index().merge(df, on=['date', 'code'], how='left')
calc = resampled.dropna()
calc['BusinessDaysToNextMat'] = calc[['date', 'matdate']].apply(lambda x: np.busday_count(*x), axis=1)
final = resampled.merge(calc, on=['date', 'code', 'matdate'], how='left')
final['BusinessDaysToNextMat'].fillna(0, inplace=True)
final
# date code matdate BusinessDaysToNextMat
#0 2018-01-02 A 2018-01-05 3.0
#1 2018-01-02 B NaT 0.0
#2 2018-01-03 A 2018-01-06 3.0
#3 2018-01-03 B NaT 0.0
#4 2018-01-04 A NaT 0.0
#5 2018-01-04 B NaT 0.0
#6 2018-01-05 A NaT 0.0
#7 2018-01-05 B NaT 0.0
#8 2018-01-08 A NaT 0.0
#9 2018-01-08 B 2018-01-12 4.0
#10 2018-01-09 A NaT 0.0
#11 2018-01-09 B NaT 0.0
#12 2018-01-10 A 2018-01-15 3.0
#13 2018-01-10 B NaT 0.0
#14 2018-01-11 A NaT 0.0
#15 2018-01-11 B 2018-01-16 3.0
#16 2018-01-12 A NaT 0.0
#17 2018-01-12 B NaT 0.0
#18 2018-01-15 A 2018-01-17 2.0
#19 2018-01-15 B NaT 0.0
答案 1 :(得分:0)
这是我目前正在做的事情,这显然不是最有效的:
# Step1: Make a new df with data of just one code and fill up any blank matdates with the very first available matdate. After that:
temp_df['newmatdate'] = datetime.date(2014,1,1) # create a temp column to hold the current minimum maturity date
temp_df['BusinessDaysToNextMat'] = 0 # this is the column that we are after
mindates = [] # initiate a list to maintain any new maturity dates which come up and keep it min-sorted
mindates.append(dummy) # where dummy is the very first available maturity date (as of 1st date we only know one trade, which is this) Have written dummy here, but it is a longer code, which may not pertain here
x = mindates[0] # create a variable to be used in the loop
g = datetime.datetime.now()
for i in range(len(temp_df['matdate'])): # loop through every date
if np.in1d(temp_df['matdate'][i],mindates)[0]==False: # if the current maturity date found DOES NOT exist in the list of mindates, add it
mindates.append(temp_df['matdate'][i])
while min(mindates)< temp_df['date'][i]: # if the current date is greater than the min mindate held so far,
mindates.sort() # sort it so you are sure to remove the min mindate
x = mindates[0] # note the date which you are dropping before dropping it
del mindates[0] # drop the curr min mindate, so the next mindate, becomes the new min mindate
if temp_df['matdate'][i] != x: # I think this might be redundant, but it is basically checking if the new matdate which you may be adding, wasn't the one
mindates.append(temp_df['matdate'][i]) # which you just removed, if not, add this new one to the list
curr_min = min(mindates)
temp_df['newmatdate'][i] = curr_min # add the current min mindate to the column
h = datetime.datetime.now()
print('loop took '+str((h-g).seconds) + ' seconds')
date = [d.date() for d in temp_df['date']] # convert from 'date' to 'datetime' to be able to use np.busday_count()
newmatdate = [d.date() for d in temp_df['newmatdate']]
temp_df['BusinessDaysToNextMat'] = np.busday_count(date,newmatdate) # phew
这也只适用于单个代码-然后我将遍历所有代码