我试图返回按小时分组的最大值。我尝试使用以下方法实现此目的,但是有多个相同的小时(组)。我希望仅返回每小时的最大值。
d = ({
'Time' : ['0/1/1900 8:00:00','0/1/1900 9:59:00','0/1/1900 10:00:00','0/1/1900 12:29:00','0/1/1900 12:30:00','0/1/1900 13:00:00','0/1/1900 13:02:00','0/1/1900 13:15:00','0/1/1900 13:20:00','0/1/1900 18:10:00','0/1/1900 18:15:00','0/1/1900 18:20:00','0/1/1900 18:25:00','0/1/1900 18:45:00','0/1/1900 18:50:00','0/1/1900 19:05:00','0/1/1900 19:07:00','0/1/1900 21:57:00','0/1/1900 22:00:00','0/1/1900 22:30:00','0/1/1900 22:35:00','1/1/1900 3:00:00','1/1/1900 3:05:00','1/1/1900 3:20:00','1/1/1900 3:25:00'],
'People' : [1,1,2,2,3,3,2,2,3,3,4,4,3,3,2,2,3,3,4,4,3,3,2,2,1],
})
df = pd.DataFrame(data = d)
df['Time'] = ['/'.join([str(int(x.split('/')[0])+1)] + x.split('/')[1:]) for x in df['Time']]
df['Time'] = pd.to_datetime(df['Time'], format='%d/%m/%Y %H:%M:%S')
df = df.groupby([pd.Grouper(key='Time',freq='H'),df.People]).size().reset_index(name='count')
print(df)
Time People count
0 1900-01-01 08:00:00 1 1
1 1900-01-01 09:00:00 1 1
2 1900-01-01 10:00:00 2 1
3 1900-01-01 12:00:00 2 1
4 1900-01-01 12:00:00 3 1
5 1900-01-01 13:00:00 2 2
6 1900-01-01 13:00:00 3 2
7 1900-01-01 18:00:00 2 1
8 1900-01-01 18:00:00 3 3
9 1900-01-01 18:00:00 4 2
10 1900-01-01 19:00:00 2 1
11 1900-01-01 19:00:00 3 1
12 1900-01-01 21:00:00 3 1
13 1900-01-01 22:00:00 3 1
14 1900-01-01 22:00:00 4 2
15 1900-01-02 03:00:00 1 1
16 1900-01-02 03:00:00 2 2
17 1900-01-02 03:00:00 3 1
预期输出:
Time People count
0 1900-01-01 08:00:00 1 1
1 1900-01-01 09:00:00 1 1
2 1900-01-01 10:00:00 2 2
3 1900-01-01 12:00:00 2 3
4 1900-01-01 13:00:00 2 3
5 1900-01-01 18:00:00 2 4
6 1900-01-01 19:00:00 2 3
7 1900-01-01 21:00:00 3 3
8 1900-01-01 22:00:00 3 4
9 1900-01-02 03:00:00 1 3
答案 0 :(得分:1)
使用pandas.DataFrame.groupby
。给定df
:
Time People
0 1900-01-01 08:00:00 1
1 1900-01-01 09:00:00 1
2 1900-01-01 10:00:00 2
3 1900-01-01 12:00:00 2
4 1900-01-01 12:00:00 3
5 1900-01-01 13:00:00 2
6 1900-01-01 13:00:00 3
7 1900-01-01 18:00:00 2
8 1900-01-01 18:00:00 3
9 1900-01-01 18:00:00 4
10 1900-01-01 19:00:00 2
11 1900-01-01 19:00:00 3
12 1900-01-01 21:00:00 3
13 1900-01-01 22:00:00 3
14 1900-01-01 22:00:00 4
15 1900-01-02 03:00:00 1
16 1900-01-02 03:00:00 2
17 1900-01-02 03:00:00 3
df.groupby('Time')['People'].max()
返回:
Time
1900-01-01 08:00:00 1
1900-01-01 09:00:00 1
1900-01-01 10:00:00 2
1900-01-01 12:00:00 3
1900-01-01 13:00:00 3
1900-01-01 18:00:00 4
1900-01-01 19:00:00 3
1900-01-01 21:00:00 3
1900-01-01 22:00:00 4
1900-01-02 03:00:00 3
答案 1 :(得分:1)
要对项目本身进行更多控制,您可以迭代df的单个键并获取其他列的max()值,然后进行修改 根据需要,然后重新创建df。这应该起作用:
import pandas as pd
d = ({
'Time' : ['0/1/1900 8:00:00','0/1/1900 9:59:00','0/1/1900 10:00:00','0/1/1900 12:29:00','0/1/1900 12:30:00','0/1/1900 13:00:00','0/1/1900 13:02:00','0/1/1900 13:15:00','0/1/1900 13:20:00','0/1/1900 18:10:00','0/1/1900 18:15:00','0/1/1900 18:20:00','0/1/1900 18:25:00','0/1/1900 18:45:00','0/1/1900 18:50:00','0/1/1900 19:05:00','0/1/1900 19:07:00','0/1/1900 21:57:00','0/1/1900 22:00:00','0/1/1900 22:30:00','0/1/1900 22:35:00','1/1/1900 3:00:00','1/1/1900 3:05:00','1/1/1900 3:20:00','1/1/1900 3:25:00'],
'People' : [1,1,2,2,3,3,2,2,3,3,4,4,3,3,2,2,3,3,4,4,3,3,2,2,1],
})
df = pd.DataFrame(data = d)
df['Time'] = ['/'.join([str(int(x.split('/')[0])+1)] + x.split('/')[1:]) for x in df['Time']]
df['Time'] = pd.to_datetime(df['Time'], format='%d/%m/%Y %H:%M:%S')
df = df.groupby([pd.Grouper(key='Time',freq='H'),df.People]).size().reset_index(name='count')
single_times = set(df['Time'])
p, c = [ [] for i in range(2) ]
for v in single_times :
c.append(max(df.loc[df['Time'] == v]['count']))
p.append(max(df.loc[df['Time'] == v]['People']))
###make something with c/p
dfdata = {
'Time' : list(single_times),
'People' : p,
'Count' : c
}
df2 = pd.DataFrame(data = dfdata)
print(df2)
可能会有更快的方法。