我有以下DataFrame df
:
id datetime_event cameraid platenumber
11 2017-05-01T00:00:08 AAA 11A
12 2017-05-01T00:00:08 AAA 223
13 2017-05-01T00:00:08 BBB 11A
14 2017-05-01T00:00:09 BBB 33D
15 2017-05-01T00:00:09 DDD 44F
16 2017-05-01T01:01:00 AAA 44F
17 2017-05-01T01:01:01 BBB 44F
18 2017-05-01T01:01:09 AAA 556
19 2017-05-01T01:01:09 AAA 778
20 2017-05-01T01:01:11 EEE 666
对于每天的每个小时,我想要选择最多100个{(1}} in(AAA,BBB)的条目,同一title
依次出现在platenumber
中,其次是AAA
。
例如,对于上面给出的示例DataFrame,输出将是这一个:
BBB
每天每小时的前100个条目可以通过以下方式提取:
id datetime_event cameraid platenumber
11 2017-05-01T00:00:08 AAA 11A
13 2017-05-01T00:00:08 BBB 11A
16 2017-05-01T01:01:00 AAA 44F
17 2017-05-01T01:01:01 BBB 44F
但是,我如何根据df = df[df.groupby(pd.to_datetime(df['datetime_event']).dt.floor('H')).cumcount() < 100]
和(最重要的)如何按title
进行合并,以便随后出现相同的platenumber值,首先是platenumber
和然后在AAA
?
答案 0 :(得分:1)
使用filter:
编辑:
#first filter only AAA, BBB for less data
df = df[df['cameraid'].isin(['AAA','BBB'])]
df1 = (df.groupby([pd.to_datetime(df['datetime_event']).dt.floor('H'),'platenumber'])
.filter(lambda x: (x['cameraid'].values == ['AAA','BBB']).all()))
print (df1)
d datetime_event cameraid platenumber
0 11 2017-05-01T00:00:08 AAA 11A
2 13 2017-05-01T00:00:08 BBB 11A
5 16 2017-05-01T01:01:00 AAA 44F
6 17 2017-05-01T01:01:01 BBB 44F
旧解决方案:
#first filter only AAA, BBB for less data
df = df[df['cameraid'].isin(['AAA','BBB'])]
#filter only 2 size groups and check if 1. value is AAA and 2. BBB
def f(x):
return len(x) == 2 and \
x['cameraid'].iat[0] == 'AAA' and \
x['cameraid'].iat[1] == 'BBB'
df = df.groupby([pd.to_datetime(df['datetime_event']).dt.floor('H'),'platenumber']).filter(f)
print (df)
d datetime_event cameraid platenumber
0 11 2017-05-01T00:00:08 AAA 11A
2 13 2017-05-01T00:00:08 BBB 11A
5 16 2017-05-01T01:01:00 AAA 44F
6 17 2017-05-01T01:01:01 BBB 44F