I have following dataframe:
some_id
2016-12-26 11:03:10 001
2016-12-26 11:03:13 001
2016-12-26 12:03:13 001
2016-12-26 12:03:13 008
2016-12-27 11:03:10 009
2016-12-27 11:03:13 009
2016-12-27 12:03:13 003
2016-12-27 12:03:13 011
And i need to do something like transform('size') with following sort and get N max values. To get something like this (N=2):
some_id size
2016-12-26 001 3
008 1
2016-12-27 009 2
003 1
Is there elegant way to do that in pandas 0.19.x?
答案 0 :(得分:4)
答案 1 :(得分:2)
设置
from io import StringIO
import pandas as pd
txt = """ some_id
2016-12-26 11:03:10 001
2016-12-26 11:03:13 001
2016-12-26 12:03:13 001
2016-12-26 12:03:13 008
2016-12-27 11:03:10 009
2016-12-27 11:03:13 009
2016-12-27 12:03:13 003
2016-12-27 12:03:13 011"""
df = pd.read_csv(StringIO(txt), sep='\s{2,}', engine='python')
df.index = pd.to_datetime(df.index)
df.some_id = df.some_id.astype(str).str.zfill(3)
df
some_id
2016-12-26 11:03:10 001
2016-12-26 11:03:13 001
2016-12-26 12:03:13 001
2016-12-26 12:03:13 008
2016-12-27 11:03:10 009
2016-12-27 11:03:13 009
2016-12-27 12:03:13 003
2016-12-27 12:03:13 011
使用nlargest
df.groupby(pd.TimeGrouper('D')).some_id.value_counts() \
.groupby(level=0, group_keys=False).nlargest(2)
some_id
2016-12-26 001 3
008 1
2016-12-27 009 2
003 1
Name: some_id, dtype: int64
答案 2 :(得分:2)
您应该可以在一行中完成此操作。
df.resample('D')['some_id'].apply(lambda s: s.value_counts().iloc[:2])
答案 3 :(得分:0)
如果您已有sizes
列,则可以使用以下内容。
df.groupby('some_id')['size'].value_counts().groupby(level=0).nlargest(2)
否则,你可以使用这种方法。
import pandas as pd
df = pd.DataFrame({'some_id':[1,1,1,8,9,9,3,11],
'some_idx':[26,26,26,26,27,27,27,27]})
sizes = df.groupby(['some_id', 'some_idx']).size()
sizes.groupby(level='some_idx').nlargest(2)
# some_idx some_id some_idx
# 26 1 26 3
# 8 26 1
# 27 9 27 2
# 3 27 1