如何使用python pandas计算总天数,小时数和分钟数?

时间:2016-04-13 05:51:55

标签: python python-2.7 pandas

这是我的数据框架。我想找到特定产品的总时间。

product,query,time1,time2
A,a1,25-06-15 08:42:43.830000000 PM,25-06-15 08:42:43.830000000 PM
A,a2,03-07-15 11:57:10.557000000 AM,03-07-15 11:57:10.557000000 AM
A,a3,02-07-15 02:32:33.090000000 PM,02-07-15 02:32:33.090000000 PM
A,a4,04-07-15 11:51:59.090000000 AM,04-07-15 11:51:59.090000000 AM
A,a5,27-06-15 07:12:30.250000000 PM,27-06-15 07:47:40.270000000 PM
B,b1,30-06-15 07:48:22.090000000 PM,30-06-15 07:48:22.090000000 PM
B,b1,01-07-15 02:59:36.290000000 PM,02-07-15 05:37:40.700000000 PM
B,b1,29-06-15 01:28:07.250000000 PM,20-07-15 12:57:06.343000000 PM
B,b1,03-07-15 05:58:52.737000000 PM,03-07-15 06:06:23.977000000 PM
B,b1,26-06-15 12:56:36.210000000 AM,26-06-15 12:56:36.210000000 AM
B,b1,22-06-15 08:16:10.743000000 PM,22-06-15 08:16:10.743000000 PM
B,b1,29-06-15 11:35:36.807000000 AM,29-06-15 11:55:01.690000000 AM

我需要输出

Product,query_count,total_time_taken
A,5,total time taken
B,7,total time taken

2 个答案:

答案 0 :(得分:2)

我认为您可以groupby使用apply自定义函数f

df[['time1', 'time2']] = df['time1'].str.split('\t').apply(pd.Series) 

#you can first convert columns to datetime
df['time1'] = pd.to_datetime(df['time1'] )
df['time2'] = pd.to_datetime(df['time2'] )

def f(x):
    return pd.Series([(x.time2 - x.time1).sum(), 
                      len(x)], 
                     index=['total_time_taken', 'query_count'])

print df.groupby('product').apply(f)

                total_time_taken  query_count
product                                      
A         0 days 00:35:10.020000            5
B        52 days 02:33:59.626000            7

答案 1 :(得分:2)

df['time'] = df.time2 - df.time1
>>> (df.groupby('product')
       .agg({'query': 'count', 'time': sum})
       .rename(columns={'query': 'query_count', 'time': 'total_time_taken'}))
         query_count        total_time_taken
product                                     
A                  5  0 days 00:35:10.020000
B                  7 52 days 02:33:59.626000

要重新创建原始数据框:

from pandas import Timestamp

df = pd.DataFrame(
    {'product': ['A'] * 6 +  ['B'] * 6,
     'query': ['a1', 'a2', 'a3', 'a4', 'a5'] + ['b1'] * 7,
     'time1': [
         Timestamp('2015-06-25 20:42:43.830000'),
         Timestamp('2015-03-07 11:57:10.557000'),
         Timestamp('2015-02-07 14:32:33.090000'),
         Timestamp('2015-04-07 11:51:59.090000'),
         Timestamp('2015-06-27 19:12:30.250000'),
         Timestamp('2015-06-30 19:48:22.090000'),
         Timestamp('2015-01-07 14:59:36.290000'),
         Timestamp('2015-06-29 13:28:07.250000'),
         Timestamp('2015-03-07 17:58:52.737000'),
         Timestamp('2015-06-26 00:56:36.210000'),
         Timestamp('2015-06-22 20:16:10.743000'),
         Timestamp('2015-06-29 11:35:36.807000')],
    'time2': [
         Timestamp('2015-06-25 20:42:43.830000'),
         Timestamp('2015-03-07 11:57:10.557000'),
         Timestamp('2015-02-07 14:32:33.090000'),
         Timestamp('2015-04-07 11:51:59.090000'),
         Timestamp('2015-06-27 19:47:40.270000'),
         Timestamp('2015-06-30 19:48:22.090000'),
         Timestamp('2015-02-07 17:37:40.700000'),
         Timestamp('2015-07-20 12:57:06.343000'),
         Timestamp('2015-03-07 18:06:23.977000'),
         Timestamp('2015-06-26 00:56:36.210000'),
         Timestamp('2015-06-22 20:16:10.743000'),
         Timestamp('2015-06-29 11:55:01.690000')]})