Python Pandas Dataframe:基于DateTime标准,我想用其他数据框

时间:2018-03-18 12:26:15

标签: python pandas dataframe

我创建了一个简单的数据框“F_test”。我现在想根据“P”中的单元格是否与“F_test”位于同一行并且位于该行的startdates / enddates之间来填充另一个数据帧“P”,其中来自“F_test”的数据。

但是,当我执行一个简单的For循环来执行此操作时,在第一行之后,“P”矩阵中没有更新其他数据。

在我的电脑上的代码中,我实际上是从Excel文件中提取“F_test”数据,但为了在此论坛上提供完整的数据集,我手动创建了名为“F_test”的简单数据框。

正如您可以从代码中看出来的那样,我是Matlab / VBA Excel世界的最新转换......

我真的很感激你对这个话题的看法。

F0 = ('08/02/2018','08/02/2018',50)
F1 = ('08/02/2018','09/02/2018',52)
F2 = ('10/02/2018','11/02/2018',46)
F3 = ('12/02/2018','16/02/2018',55)
F4 = ('09/02/2018','28/02/2018',48)
F_mat = [[F0,F1,F2,F3,F4]]
F_test = pd.DataFrame(np.array(F_mat).reshape(5,3),columns= ('startdate','enddate','price'))

#convert string dates into DateTime data type
F_test['startdate'] = pd.to_datetime(F_test['startdate'])
F_test['enddate'] = pd.to_datetime(F_test['enddate'])

#convert datetype to be datetime type for columns startdate and enddate
F['startdate'] = pd.to_datetime(F['startdate'])
F['enddate'] = pd.to_datetime(F['enddate'])

#create contract duration column
F['duration'] = (F['enddate'] - F['startdate']).dt.days + 1

#re-order the F matrix by column 'duration', ensure that the bootstrapping 
#prioritises the shorter term contracts 
F.sort_values(by=['duration'], ascending=[True])

#create D matrix, dataframe containing each day from start to end date
tempDateRange = pd.date_range(start=F['startdate'].min(), end=F['enddate'].max(), freq='D')
D = pd.DataFrame(tempDateRange)

#define Nb of Calendar Days in a variable to be used later
intNbCalendarDays = (F['enddate'].max() - F['startdate'].min()).days + 1

#define Nb of Contracts in a variable to be used later
intNbContracts = len(F)

#define a zero filled matrix, P, which will house the contract prices 
P = pd.DataFrame(np.zeros((intNbContracts, intNbCalendarDays)))

#rename columns of P to be the dates contained in matrix array D
P.columns = tempDateRange 

#create prices in correct rows in P
for i in list(range(0, intNbContracts)):
    for j in list(range(0, intNbCalendarDays)):
        if ((F.iloc[i,0] >= P.columns[j]) & (F.iloc[i,1] <= P.columns[j] )):
            P.iloc[i,j] = F.iloc[i,2]
P

2 个答案:

答案 0 :(得分:0)

我认为你的日期比较在最后是错误的方式,你应该使用&#39;和&#39;不是&#39;&amp;&#39; (这是按位运算符)。试试这个:

# create prices in correct rows in P
for i in list(range(0, intNbContracts)):
    for j in list(range(0, intNbCalendarDays)):
        if (F.iloc[i, 0] <= P.columns[j]) and (F.iloc[i, 1] >= P.columns[j]):
            P.iloc[i, j] = F.iloc[i, 2]

答案 1 :(得分:0)

这可能仍然没有你能得到的效率,但我认为更好。这将取代您的“#create D矩阵,包含......的数据框”评论

# create prices P
P = pd.DataFrame()
for index, row in F.iterrows():
    new_P_row = pd.Series()
    for date in pd.date_range(row['startdate'], row['enddate']):
        new_P_row[date] = row['price']
    P = P.append(new_P_row, ignore_index=True)

P.fillna(0, inplace=True)