pandas数据帧中的平均签出时间

时间:2014-07-01 20:30:36

标签: python pandas dataframe

编辑此问题:如何切割此数据框并创建一个只有一个日期具有公司名称和时间标记的新数据框?

    Google.com 2012-05-01 18:20:27.167000
1   Google.com 2012-05-01 19:16:08.070000
2   Google.com 2012-05-01 19:20:07.880000
3   Google.com 2012-05-01 19:33:02.200000
4   Google.com 2012-05-01 19:35:09.173000
5   Google.com 2012-05-01 20:18:55.610000
6   Google.com 2012-05-01 20:26:27.577000
8  Google.com 2012-05-02 12:51:12.013000
9  Google.com 2012-05-02  12:56:52.013000
10  Google.com 2012-05-02 12:59:55.167000
11  Google.com 2012-05-02 13:04:25.687000
12  Google.com 2012-05-02 13:16:36.263000

有点像这样

    Google.com 2012-05-01 18:20:27.167000
1   Google.com 2012-05-01 19:16:08.070000
2   Google.com 2012-05-01 19:20:07.880000
3   Google.com 2012-05-01 19:33:02.200000
4   Google.com 2012-05-01 19:35:09.173000
5   Google.com 2012-05-01 20:18:55.610000
6   Google.com 2012-05-01 20:26:27.577000

然后计算此日期的平均签约时间?

1 个答案:

答案 0 :(得分:1)

您可以这样做:

首先,我创建一个数据框:

import pandas as pd
from StringIO import StringIO
text = """site date time
1   Google.com 2012-05-01 19:16:08.070000
2   Google.com 2012-05-01 19:20:07.880000
3   Google.com 2012-05-01 19:33:02.200000
4   Google.com 2012-05-01 19:35:09.173000
5   Google.com 2012-05-01 20:18:55.610000
6   Google.com 2012-05-01 20:26:27.577000
8   Google.com 2012-05-02 12:51:12.013000
9   Google.com 2012-05-02 12:56:52.013000
10  Google.com 2012-05-02 12:59:55.167000
11  Google.com 2012-05-02 13:04:25.687000
12  Google.com 2012-05-02 13:16:36.263000
"""
tab = pd.read_table(StringIO(text),index_col=0,sep='\s+')

然后按日期拆分数据,并计算每个日期的时间滞后平均值。

for group,value in tab.groupby('date'):
    print group
    print pd.to_datetime(value.time).diff().mean()

## 2012-05-01
## 0   00:14:03.901400
## dtype: timedelta64[ns]
## 2012-05-02
## 0   00:06:21.062500
## dtype: timedelta64[ns]