这是我的数据框架。我想找到特定产品的总时间。
product,query,time1,time2
A,a1,25-06-15 08:42:43.830000000 PM,25-06-15 08:42:43.830000000 PM
A,a2,03-07-15 11:57:10.557000000 AM,03-07-15 11:57:10.557000000 AM
A,a3,02-07-15 02:32:33.090000000 PM,02-07-15 02:32:33.090000000 PM
A,a4,04-07-15 11:51:59.090000000 AM,04-07-15 11:51:59.090000000 AM
A,a5,27-06-15 07:12:30.250000000 PM,27-06-15 07:47:40.270000000 PM
B,b1,30-06-15 07:48:22.090000000 PM,30-06-15 07:48:22.090000000 PM
B,b1,01-07-15 02:59:36.290000000 PM,02-07-15 05:37:40.700000000 PM
B,b1,29-06-15 01:28:07.250000000 PM,20-07-15 12:57:06.343000000 PM
B,b1,03-07-15 05:58:52.737000000 PM,03-07-15 06:06:23.977000000 PM
B,b1,26-06-15 12:56:36.210000000 AM,26-06-15 12:56:36.210000000 AM
B,b1,22-06-15 08:16:10.743000000 PM,22-06-15 08:16:10.743000000 PM
B,b1,29-06-15 11:35:36.807000000 AM,29-06-15 11:55:01.690000000 AM
我需要输出
Product,query_count,total_time_taken
A,5,total time taken
B,7,total time taken
答案 0 :(得分:2)
df[['time1', 'time2']] = df['time1'].str.split('\t').apply(pd.Series)
#you can first convert columns to datetime
df['time1'] = pd.to_datetime(df['time1'] )
df['time2'] = pd.to_datetime(df['time2'] )
def f(x):
return pd.Series([(x.time2 - x.time1).sum(),
len(x)],
index=['total_time_taken', 'query_count'])
print df.groupby('product').apply(f)
total_time_taken query_count
product
A 0 days 00:35:10.020000 5
B 52 days 02:33:59.626000 7
答案 1 :(得分:2)
df['time'] = df.time2 - df.time1
>>> (df.groupby('product')
.agg({'query': 'count', 'time': sum})
.rename(columns={'query': 'query_count', 'time': 'total_time_taken'}))
query_count total_time_taken
product
A 5 0 days 00:35:10.020000
B 7 52 days 02:33:59.626000
要重新创建原始数据框:
from pandas import Timestamp
df = pd.DataFrame(
{'product': ['A'] * 6 + ['B'] * 6,
'query': ['a1', 'a2', 'a3', 'a4', 'a5'] + ['b1'] * 7,
'time1': [
Timestamp('2015-06-25 20:42:43.830000'),
Timestamp('2015-03-07 11:57:10.557000'),
Timestamp('2015-02-07 14:32:33.090000'),
Timestamp('2015-04-07 11:51:59.090000'),
Timestamp('2015-06-27 19:12:30.250000'),
Timestamp('2015-06-30 19:48:22.090000'),
Timestamp('2015-01-07 14:59:36.290000'),
Timestamp('2015-06-29 13:28:07.250000'),
Timestamp('2015-03-07 17:58:52.737000'),
Timestamp('2015-06-26 00:56:36.210000'),
Timestamp('2015-06-22 20:16:10.743000'),
Timestamp('2015-06-29 11:35:36.807000')],
'time2': [
Timestamp('2015-06-25 20:42:43.830000'),
Timestamp('2015-03-07 11:57:10.557000'),
Timestamp('2015-02-07 14:32:33.090000'),
Timestamp('2015-04-07 11:51:59.090000'),
Timestamp('2015-06-27 19:47:40.270000'),
Timestamp('2015-06-30 19:48:22.090000'),
Timestamp('2015-02-07 17:37:40.700000'),
Timestamp('2015-07-20 12:57:06.343000'),
Timestamp('2015-03-07 18:06:23.977000'),
Timestamp('2015-06-26 00:56:36.210000'),
Timestamp('2015-06-22 20:16:10.743000'),
Timestamp('2015-06-29 11:55:01.690000')]})