我正在尝试将一堆时间序列数据分组为2小时的块。我对此很新,请耐心等待我。我想我可以根据之前的研究使用熊猫。
我有一个如下所示的数据集(mytime):
['15:23', '14:41', '13:54', '07:13', '20:21', '13:15', '14:48', '12:06', '08:37', '06:32', '07:04', '14:20', '16:28',
'06:49', '08:39', '09:15', '08:54', '05:37', '14:43', '06:20', '11:25', '11:05', '09:28', '14:05', '14:24', '15:30',
'13:28', '16:55', '09:29', '17:44', '07:24', '09:37', '06:47', '14:35', '10:55', '22:29', '06:24', '09:25', '06:45',
'23:49', '19:34', '01:31', '14:22', '13:58', '09:08', '05:11', '08:09', '08:52', '02:50', '12:51', '17:33', '07:07',
'08:11', '10:06', '23:48', '22:27', '11:15', '15:09', '16:45', '20:42', '12:12', '07:08', '16:13', '20:40', '17:26',
'18:57', '15:07', '09:19', '09:10', '09:17', '09:26', '14:18', '06:31', '14:13', '14:01', '08:57', '21:34']
我想采用这个数据集,基本上看到这样的输出:
0-2: 4
2-4: 7
4-6: 3
6-8: 3
8-10: 2
10-12: 5
12-14: 14
....etc
这是我的代码的子集
import csv
from collections import Counter
import pandas as pd
import numpy as np
mycount = Counter()
mytime = []
with open('temp_dates.csv') as csvfile2:
readCSV2 = csv.reader(csvfile2, delimiter=',')
incoming = []
for row in readCSV2:
readin = row[0]
time = row[1]
year, month, day = (int(x) for x in readin.split('-'))
ans = datetime.date(year, month, day)
wkday = ans.strftime("%A")
incoming.append([wkday,time])
mycount[wkday] += 1
mytime.append(time)
with open('new_dates2.csv', 'w') as out_file:
writer = csv.writer(out_file)
writer.writerows(incoming)
csvfile2.close()
for key,value in sorted(mycount.iteritems()):
daylist = key, value
print(daylist)
#print(mytime)
df = pd.DataFrame()
#print(df)
df.groupby([df['mytime'],pd.TimeGrouper(freq='2H')])
我猜我的第一个问题是数据格式不正确,TimeGrouper无法理解?其次,我可能遗漏了一些告诉数据框要看什么的东西?任何帮助,将不胜感激。
根据请求,原始源CSV文件的片段如下(我们只讨论填充到'mytime'的第2列)。
Sunday,14:35
Sunday,10:55
Friday,22:29
Friday,06:24
Thursday,09:25
Wednesday,06:45
答案 0 :(得分:1)
<强>更新强>
In [96]: mytime = ['15:23', '14:41', '13:54', '07:13', '20:21', '13:15', '14:48', '12:06', '08:37', '06:32', '07:04', '14:20', '16:28',
...:
...: '06:49', '08:39', '09:15', '08:54', '05:37', '14:43', '06:20', '11:25', '11:05', '09:28', '14:05', '14:24', '15:30',
...: '13:28', '16:55', '09:29', '17:44', '07:24', '09:37', '06:47', '14:35', '10:55', '22:29', '06:24', '09:25', '06:45',
...: '23:49', '19:34', '01:31', '14:22', '13:58', '09:08', '05:11', '08:09', '08:52', '02:50', '12:51', '17:33', '07:07',
...: '08:11', '10:06', '23:48', '22:27', '11:15', '15:09', '16:45', '20:42', '12:12', '07:08', '16:13', '20:40', '17:26',
...: '18:57', '15:07', '09:19', '09:10', '09:17', '09:26', '14:18', '06:31', '14:13', '14:01', '08:57', '21:34']
In [97]: s = pd.to_datetime(mytime).to_series()
In [98]: s
Out[98]:
2017-04-26 15:23:00 2017-04-26 15:23:00
2017-04-26 14:41:00 2017-04-26 14:41:00
2017-04-26 13:54:00 2017-04-26 13:54:00
2017-04-26 07:13:00 2017-04-26 07:13:00
2017-04-26 20:21:00 2017-04-26 20:21:00
2017-04-26 13:15:00 2017-04-26 13:15:00
2017-04-26 14:48:00 2017-04-26 14:48:00
2017-04-26 12:06:00 2017-04-26 12:06:00
2017-04-26 08:37:00 2017-04-26 08:37:00
2017-04-26 06:32:00 2017-04-26 06:32:00
...
2017-04-26 09:19:00 2017-04-26 09:19:00
2017-04-26 09:10:00 2017-04-26 09:10:00
2017-04-26 09:17:00 2017-04-26 09:17:00
2017-04-26 09:26:00 2017-04-26 09:26:00
2017-04-26 14:18:00 2017-04-26 14:18:00
2017-04-26 06:31:00 2017-04-26 06:31:00
2017-04-26 14:13:00 2017-04-26 14:13:00
2017-04-26 14:01:00 2017-04-26 14:01:00
2017-04-26 08:57:00 2017-04-26 08:57:00
2017-04-26 21:34:00 2017-04-26 21:34:00
dtype: datetime64[ns]
In [106]: s.groupby(pd.cut(s.dt.hour,
...: bins=np.arange(26, step=2),
...: right=False,
...: include_lowest=True)) \
...: .size()
...:
Out[106]:
[0, 2) 1
[2, 4) 1
[4, 6) 2
[6, 8) 12
[8, 10) 17
[10, 12) 5
[12, 14) 7
[14, 16) 15
[16, 18) 7
[18, 20) 2
[20, 22) 4
[22, 24) 4
dtype: int64
df = pd.read_csv('/path/to/file.csv', parse_dates=[1], names=['date','time'])
In [55]: df
Out[55]:
date time
0 Sunday 2017-04-26 14:35:00
1 Sunday 2017-04-26 10:55:00
2 Friday 2017-04-26 22:29:00
3 Friday 2017-04-26 06:24:00
4 Thursday 2017-04-26 09:25:00
5 Wednesday 2017-04-26 06:45:00
In [59]: df.groupby(pd.cut(df.time.dt.hour, bins=np.arange(26, step=2), include_lowest=True)).size()
Out[59]:
time
[0, 2] 0
(2, 4] 0
(4, 6] 2
(6, 8] 0
(8, 10] 2
(10, 12] 0
(12, 14] 1
(14, 16] 0
(16, 18] 0
(18, 20] 0
(20, 22] 1
(22, 24] 0
dtype: int64
答案 1 :(得分:0)
这就是我得到的,仍在努力排序,你会看到输出:
data = ['15:23', '14:41', '13:54', '07:13', '20:21', '13:15', '14:48', '12:06', '08:37', '06:32', '07:04', '14:20', '16:28',
'06:49', '08:39', '09:15', '08:54', '05:37', '14:43', '06:20', '11:25', '11:05', '09:28', '14:05', '14:24', '15:30',
'13:28', '16:55', '09:29', '17:44', '07:24', '09:37', '06:47', '14:35', '10:55', '22:29', '06:24', '09:25', '06:45',
'23:49', '19:34', '01:31', '14:22', '13:58', '09:08', '05:11', '08:09', '08:52', '02:50', '12:51', '17:33', '07:07',
'08:11', '10:06', '23:48', '22:27', '11:15', '15:09', '16:45', '20:42', '12:12', '07:08', '16:13', '20:40', '17:26',
'18:57', '15:07', '09:19', '09:10', '09:17', '09:26', '14:18', '06:31', '14:13', '14:01', '08:57', '21:34']
import pandas as pd
df = pd.DataFrame({'mytime': data})
df['mytime'] = pd.to_datetime(df['mytime']).dt.floor('2H').dt.time
df['hour'] = df.mytime.apply(lambda x: str(x.hour) + '-' + str(x.hour +2))
df = df.groupby('hour').size()
答案 2 :(得分:0)
这是一种使用numpy直方图函数的方法:
import numpy as np
data = ['15:23', '14:41', '13:54', '07:13', '20:21', '13:15', '14:48', '12:06', '08:37', '06:32', '07:04', '14:20', '16:28','06:49', '08:39', '09:15','08:54', '05:37', '14:43', '06:20', '11:25', '11:05', '09:28', '14:05','14:24', '15:30', '13:28', '16:55', '09:29', '17:44', '07:24', '09:37','06:47', '14:35', '10:55', '22:29', '06:24', '09:25', '06:45', '23:49','19:34', '01:31', '14:22', '13:58', '09:08', '05:11', '08:09', '08:52','02:50', '12:51', '17:33', '07:07', '08:11', '10:06', '23:48', '22:27','11:15', '15:09', '16:45', '20:42', '12:12', '07:08', '16:13', '20:40','17:26', '18:57', '15:07', '09:19', '09:10', '09:17', '09:26', '14:18', '06:31', '14:13', '14:01', '08:57', '21:34']
time = [int(h) + int(m)/60 for h, m in (y.split(':') for y in data)]
bins = list(range(0, 26, 2))
counts, bins = np.histogram(time, bins)
dict(zip(bins, counts))
结果:
{0: 1,
2: 1,
4: 2,
6: 12,
8: 17,
10: 5,
12: 7,
14: 15,
16: 7,
18: 2,
20: 4,
22: 4}