Function run across 2 dataframes

时间:2017-08-30 08:34:52

标签: python pandas dataframe

I have a function to compute the number of days between two dates using a 360 day year (if only it was just a 365 days difference lol).

def day_count_30_360 (start_date, end_date):
    """Returns number of days between start_date and end_date, using Thirty/360 convention"""

    d1 = min(30, start_date.day)
    d2 = min(d1, end_date.day) if d1 == 30 else end_date.day

    return 360 * (end_date.year - start_date.year)\
           + 30 * (end_date.month - start_date.month)\
           + d2 - d1

I am currently running a for loop to run each value but this is terribly slow.

for col in range(len(df_start_dt.columns)):
    for row in range(len(df_start_dt.index)):
        df_out.iloc[row, col] = day_count_30_360(df_start_dt.iloc[row, col], df_end_dt.iloc[row, col])

Is there any way to run both dataframes through the same function without looping? Thanks!

Example of dataframe:

Created dummy df for testing:

df_start_dt = pd.DataFrame([[pd.datetime(2004,1,1),pd.datetime(2004,1,1),pd.datetime(2004,1,1)], [pd.datetime(2004,2,2),pd.datetime(2004,2,2),pd.datetime(2004,2,2)]])

df_end_dt = pd.DataFrame([[pd.datetime(2005,1,1),pd.datetime(2005,1,1),pd.datetime(2005,1,1)], [pd.datetime(2005,2,2),pd.datetime(2005,2,2),pd.datetime(2006,2,2)]])

Both dataframes have the same index, headers, dimensions

1 个答案:

答案 0 :(得分:1)

您可以concat使用df,然后使用groupby并汇总:

df = pd.concat([df_start_dt, df_end_dt], keys=['a','b'])
df = df.groupby(level=1).agg(lambda x: day_count_30_360(x.iat[0], x.iat[-1]))
print (df)
     0    1    2
0  360  360  360
1  360  360  720

另一种改变功能的解决方案:

def day_count_30_360 (x):
    """Returns number of days between start_date and end_date, using Thirty/360 convention"""
    start_date = x.iat[0]
    end_date =  x.iat[-1]

    d1 = min(30, start_date.day)
    d2 = min(d1, end_date.day) if d1 == 30 else end_date.day

    return 360 * (end_date.year - start_date.year)\
           + 30 * (end_date.month - start_date.month)\
           + d2 - d1

df = pd.concat([df_start_dt, df_end_dt], keys=['a','b'])
df = df.groupby(level=1).agg(day_count_30_360)
print (df)
     0    1    2
0  360  360  360
1  360  360  720