计算相同2个日期的pandas groupby对象中2个日期的差异

时间:2017-05-31 21:12:16

标签: python pandas numpy python-datetime pandas-groupby

我正在尝试在两个日期列之间创建一个工作日数的新String result = ex.toString() + "\n"; StackTraceElement[] trace = ex.getStackTrace(); for (int i=0;i<trace.length;i++) { result += trace[i].toString() + "\n";} return result;列。我无法在日期列中引用日期作为函数调用中的参数(我得到一个TypeError:无法转换输入错误)。但是,我可以将系列中的值压缩到List中,并使用For循环来引用参数。理想情况下,我更愿意从两个Date列创建一个GroupBy对象并计算差异。

创建DataFrame:

pandas.DataFrame

验证DataFrame:

import pandas as pd

df = pd.DataFrame.from_dict({'Date1': ['2017-05-30 16:00:00',
  '2017-05-30 16:00:00',
  '2017-05-30 16:00:00'],
 'Date2': ['2017-06-16 16:00:00',
  '2017-07-21 16:00:00',
  '2017-08-18 16:00:00'],
 'Value1': [2.97, 3.3, 4.03],
 'Value2': [96L, 14L, 2L]})

df['Date1'] = pd.to_datetime(df['Date1'])
df['Date2'] = pd.to_datetime(df['Date2'])

df.dtypes

定义功能:

Date1     datetime64[ns]
Date2     datetime64[ns]
Value1           float64
Value2             int64
dtype: object

从date_diff函数调用的结果中尝试列:

def date_diff(startDate, endDate):
    return float(len(pd.bdate_range(startDate, endDate)) - 1)

类型错误:

df['DateDiff'] = date_diff(df['Date1'], df['Date2'])

引用包含日期的元组列表的“For Loop”:

TypeError: Cannot convert input [0   2017-05-30 16:00:00
1   2017-05-30 16:00:00
2   2017-05-30 16:00:00
Name: Date1, dtype: datetime64[ns]] of type <class 'pandas.core.series.Series'> to Timestamp

理想情况下,我想使用GroupBy对象(按Date1&amp; Date2):

date_List = list(zip(df['Date1'], df['Date2']))

for i in range(len(date_List)):
    df.loc[(df['Date1'] == date_List[i][0]) & (df['Date2'] == date_List[i][1]), 'diff'] = date_diff(date_List[i][0], date_List[i][1])

                Date1               Date2  Value1  Value2  diff
0 2017-05-30 16:00:00 2017-06-16 16:00:00    2.97      96  13.0
1 2017-05-30 16:00:00 2017-07-21 16:00:00    3.30      14  38.0
2 2017-05-30 16:00:00 2017-08-18 16:00:00    4.03       2  58.0

期望的输出:

grp = df.groupby(['Date1', 'Date2'])

1 个答案:

答案 0 :(得分:1)

你需要对import numpy as np def date_diff(start_dates, end_dates): return np.busday_count( start_dates.values.astype('datetime64[D]'), end_dates.values.astype('datetime64[D]')) 进行类型转换才能让numpy高兴得像:

代码:

import pandas as pd
df = pd.DataFrame.from_dict({'Date1': ['2017-05-30 16:00:00',
                                       '2017-05-30 16:00:00',
                                       '2017-05-30 16:00:00'],
                             'Date2': ['2017-06-16 16:00:00',
                                       '2017-07-21 16:00:00',
                                       '2017-08-18 16:00:00'],
                             'Value1': [2.97, 3.3, 4.03],
                             'Value2': [96L, 14L, 2L]})

df['Date1'] = pd.to_datetime(df['Date1'])
df['Date2'] = pd.to_datetime(df['Date2'])

df['DateDiff'] = date_diff(df['Date1'], df['Date2'])
print(df)

测试代码:

                Date1               Date2  Value1  Value2  DateDiff
0 2017-05-30 16:00:00 2017-06-16 16:00:00    2.97      96        13
1 2017-05-30 16:00:00 2017-07-21 16:00:00    3.30      14        38
2 2017-05-30 16:00:00 2017-08-18 16:00:00    4.03       2        58

结果:

<configuration>
    <system.webServer>
        <directoryBrowse enabled="false" />
            <authorization>
                        <deny users="?" />
            </authorization>
            <security>
                <ipSecurity allowUnlisted="false">
                    <add ipAddress="8.8.8.8" allowed="true" />
                </ipSecurity>
            </security>
    </system.webServer>
</configuration>