编辑此问题:如何切割此数据框并创建一个只有一个日期具有公司名称和时间标记的新数据框?
Google.com 2012-05-01 18:20:27.167000
1 Google.com 2012-05-01 19:16:08.070000
2 Google.com 2012-05-01 19:20:07.880000
3 Google.com 2012-05-01 19:33:02.200000
4 Google.com 2012-05-01 19:35:09.173000
5 Google.com 2012-05-01 20:18:55.610000
6 Google.com 2012-05-01 20:26:27.577000
8 Google.com 2012-05-02 12:51:12.013000
9 Google.com 2012-05-02 12:56:52.013000
10 Google.com 2012-05-02 12:59:55.167000
11 Google.com 2012-05-02 13:04:25.687000
12 Google.com 2012-05-02 13:16:36.263000
有点像这样
Google.com 2012-05-01 18:20:27.167000
1 Google.com 2012-05-01 19:16:08.070000
2 Google.com 2012-05-01 19:20:07.880000
3 Google.com 2012-05-01 19:33:02.200000
4 Google.com 2012-05-01 19:35:09.173000
5 Google.com 2012-05-01 20:18:55.610000
6 Google.com 2012-05-01 20:26:27.577000
然后计算此日期的平均签约时间?
答案 0 :(得分:1)
您可以这样做:
首先,我创建一个数据框:
import pandas as pd
from StringIO import StringIO
text = """site date time
1 Google.com 2012-05-01 19:16:08.070000
2 Google.com 2012-05-01 19:20:07.880000
3 Google.com 2012-05-01 19:33:02.200000
4 Google.com 2012-05-01 19:35:09.173000
5 Google.com 2012-05-01 20:18:55.610000
6 Google.com 2012-05-01 20:26:27.577000
8 Google.com 2012-05-02 12:51:12.013000
9 Google.com 2012-05-02 12:56:52.013000
10 Google.com 2012-05-02 12:59:55.167000
11 Google.com 2012-05-02 13:04:25.687000
12 Google.com 2012-05-02 13:16:36.263000
"""
tab = pd.read_table(StringIO(text),index_col=0,sep='\s+')
然后按日期拆分数据,并计算每个日期的时间滞后平均值。
for group,value in tab.groupby('date'):
print group
print pd.to_datetime(value.time).diff().mean()
## 2012-05-01
## 0 00:14:03.901400
## dtype: timedelta64[ns]
## 2012-05-02
## 0 00:06:21.062500
## dtype: timedelta64[ns]